Set sharding keys for feature category `importers` tables

About

As part of Cells preparation, all tables need to have a "sharding key" defined.

This issue was created from !152751 (merged) where we set the sharding_key_issue_url for some feature category importers tables to point to this issue, as a temporary step to allow us to schedule the work into a milestone.

If we have questions or concerns, we can reach out to #g_tenant-scale.

The below description was copied from !152751 (merged).

Task

Sharding keys need to be set for the tables:

  • bulk_imports
    • Sharding keys: TBD
    • Note: It's currently classified as gitlab_main_clusterwide but we should change it to gitlab_main_cell and determine what sharding keys to use. There's a separate issue to work on this: #499829 (closed)
  • bulk_import_exports
    • Sharding keys: group_id/project_id
    • MR: !168411 (merged)
    • Directly linked to group_id/project_id already
  • bulk_import_batch_trackers
    • Sharding keys: namespace_id/project_id
    • MR: !168390 (merged)
    • Relation to sharding keys: bulk_import_trackers => bulk_import_entities => namespace_id/project_id
  • bulk_import_configurations
    • Note: Same as bulk_imports. As per #499829 (closed) we need to find a solution to prevent cross-organizational references.
  • bulk_import_entities
    • Sharding keys: organization_id, namespace_id, project_id
    • First MR: !172262 (merged)
    • Note: As these are created before a project or namespace exists, we must also have an organization ID as the sharding key, and then switch over to the project/namespace when available. This will be done in 2 MRs
  • bulk_import_export_batches
    • Sharding keys: group_id/project_id
    • MR: !168387 (merged)
    • Relation to sharding keys: bulk_import_exports => project_id/group_id
  • bulk_import_export_uploads
    • Sharding keys: group_id/project_id
    • MR: !168388 (merged)
    • Relation to sharding keys: bulk_import_exports => project_id/group_id
  • bulk_import_failures
    • Sharding keys: namespace_id/project_id
    • MR: !168389 (merged)
    • Relation to sharding keys: bulk_import_entities => namespace_id/project_id
  • bulk_import_trackers
    • Sharding keys: namespace_id/project_id
    • MR: !168385 (merged)
    • Relation to sharding keys: bulk_import_entities => namespace_id/project_id
  • import_export_uploads
    • Sharding keys: group_id/project_id(Already exists in the table)
    • MR: !168392 (merged))
  • import_failures
    • Sharding keys: group_id/project_id (Already exist in the table)
    • Note: Some rows have neither set, and are linked to the user instead so we need a fallback option
    • MR: !168393 (merged)
  • project_import_data

This involves choosing one of the following, based on the intended behaviour of the table:

  • The table is not cell-local
    • Set gitlab_schema to gitlab_main_clusterwide.
  • The table is cell-local and requires a sharding key
    • Set gitlab_schema to gitlab_main_cell
    • Add a sharding_key or desired_sharding_key configuration. If the configuration is known but the chosen key doesn't yet meet not-null and foreign key requirements, you can add an exception to allowed_to_be_missing_not_null or allowed_to_be_missing_foreign_key to get the pipeline passing. Please link to a follow-up issue in a code comment next to the exception.
    • You may also need to set allow_cross_joins, allow_cross_transactions and allow_cross_foreign_keys if changing the schema causes pipeline failures. See db/docs/epics.yml for an example.
  • The table is cell-local and does not require a sharding key
    • Set gitlab_schema to gitlab_main_cell and
    • Set exempt_from_sharding to true.

Documentation

Edited by Keeyan Nejad