Skip to content

Draft: Add sharding key tracking issues for importers

Add sharding key tracking issues for feature category importers.

Background

These tables were unable to be classified automatically, and will require manual input. Eventually all tables will need to be correctly classified, but we understand that this will be complex for some tables and completing these will take time. Instead, our goal for this task is to ensure all remaining tables are tracked in an issue, and to classify any straightforward cases that our automation may have missed (options 1 and 2 below).

We have assigned a random backend engineer from groupimport and integrate as the initial DRI for this task, as well as an engineering manager for visibility. Please note that we are not requesting a large time commitment, creating one issue and linking it for all tables is perfectly acceptable.

When you are finished, please assign to the database reviewer/maintainer suggested by Danger.

If you have any questions or concerns, reach out to #g_tenant-scale.

Task

For each table, please select one of the following options:

Option 1: Add a sharding key tracking issue

This option is best suited to tables whose sharding behaviour is unknown, or will require additional work before a sharding key can be defined.

Replace the TODO in the dictionary file with a link to an issue in the gitlab-org/gitlab project.

- sharding_key_issue_url: TODO
+ sharding_key_issue_url: https://gitlab.com/gitlab-org/gitlab/-/issues/1234

You can create a new issue or link an existing one, and multiple entries can refer to the same issue. These issues will be used to track the work remaining on the progress dashboard.

If you are creating a new issue, you can copy over the following contents to the issue description:

Click to expand

Issue Title: Set sharding keys for tables in 'group::import and integrate'

Issue Description:

Sharding keys need to be set for the tables: bulk_import_batch_trackers, bulk_import_configurations, bulk_import_entities, bulk_import_export_batches, bulk_import_export_uploads, bulk_import_failures, bulk_import_trackers, import_export_uploads, import_failures, project_import_data

This involves choosing one of the following, based on the intended behaviour of the table:

  • The table is not cell-local
    • Set gitlab_schema to gitlab_main_clusterwide.
  • The table is cell-local and requires a sharding key
    • Set gitlab_schema to gitlab_main_cell
    • Add a sharding_key or desired_sharding_key configuration. If the configuration is known but the chosen key doesn't yet meet not-null and foreign key requirements, you can add an exception to allowed_to_be_missing_not_null or allowed_to_be_missing_foreign_key to get the pipeline passing. Please link to a follow-up issue in a code comment next to the exception.
    • You may also need to set allow_cross_joins, allow_cross_transactions and allow_cross_foreign_keys if changing the schema causes pipeline failures. See db/docs/epics.yml for an example.
  • The table is cell-local and does not require a sharding key
    • Set gitlab_schema to gitlab_main_cell and
    • Set exempt_from_sharding to true.

Documentation

Option 2: Add sharding key configuration

This option is best suited to tables with an easily identifiable sharding key that will require minimal work to define.

Remove sharding_key_issue_url from the dictionary file and instead complete the classification for the table. This involves choosing one of the following, based on the intended behaviour of the table:

  • The table is not cell-local
    • Set gitlab_schema to gitlab_main_clusterwide.
  • The table is cell-local and requires a sharding key
    • Set gitlab_schema to gitlab_main_cell
    • Add a sharding_key or desired_sharding_key configuration. If the configuration is known but the chosen key doesn't yet meet not-null and foreign key requirements, you can add an exception to allowed_to_be_missing_not_null or allowed_to_be_missing_foreign_key to get the pipeline passing. Please link to a follow-up issue in a code comment next to the exception.
    • You may also need to set allow_cross_joins, allow_cross_transactions and allow_cross_foreign_keys if changing the schema causes pipeline failures. See db/docs/epics.yml for an example.
  • The table is cell-local and does not require a sharding key
    • Set gitlab_schema to gitlab_main_cell and
    • Set exempt_from_sharding to true.

Documentation

Related to #455137 (closed)

This change was generated by gitlab-housekeeper using the Keeps::AddShardingKeyTrackingIssues keep.

To provide feedback on your experience with gitlab-housekeeper please comment in #442003.

Merge request reports