Set sharding keys for feature category `importers` tables
About
As part of Cells preparation, all tables need to have a "sharding key" defined.
This issue was created from !152751 (merged) where we set the sharding_key_issue_url for some feature category importers tables to point to this issue, as a temporary step to allow us to schedule the work into a milestone.
If we have questions or concerns, we can reach out to #g_tenant-scale.
The below description was copied from !152751 (merged).
Task
Sharding keys need to be set for the tables:
-
bulk_imports- Sharding keys: TBD
- Note: It's currently classified as
gitlab_main_clusterwidebut we should change it togitlab_main_celland determine what sharding keys to use. There's a separate issue to work on this: #499829 (closed)
-
bulk_import_exports- Sharding keys:
group_id/project_id - MR: !168411 (merged)
- Directly linked to
group_id/project_idalready
- Sharding keys:
-
bulk_import_batch_trackers- Sharding keys:
namespace_id/project_id - MR: !168390 (merged)
- Relation to sharding keys:
bulk_import_trackers=>bulk_import_entities=>namespace_id/project_id
- Sharding keys:
-
bulk_import_configurations- Note: Same as
bulk_imports. As per #499829 (closed) we need to find a solution to prevent cross-organizational references.
- Note: Same as
-
bulk_import_entities- Sharding keys:
organization_id,namespace_id,project_id - First MR: !172262 (merged)
- Note: As these are created before a project or namespace exists, we must also have an organization ID as the sharding key, and then switch over to the project/namespace when available. This will be done in 2 MRs
- Sharding keys:
-
bulk_import_export_batches- Sharding keys:
group_id/project_id - MR: !168387 (merged)
- Relation to sharding keys:
bulk_import_exports=>project_id/group_id
- Sharding keys:
-
bulk_import_export_uploads- Sharding keys:
group_id/project_id - MR: !168388 (merged)
- Relation to sharding keys:
bulk_import_exports=>project_id/group_id
- Sharding keys:
-
bulk_import_failures- Sharding keys:
namespace_id/project_id - MR: !168389 (merged)
- Relation to sharding keys:
bulk_import_entities=>namespace_id/project_id
- Sharding keys:
-
bulk_import_trackers- Sharding keys:
namespace_id/project_id - MR: !168385 (merged)
- Relation to sharding keys:
bulk_import_entities=>namespace_id/project_id
- Sharding keys:
-
import_export_uploads- Sharding keys:
group_id/project_id(Already exists in the table) - MR: !168392 (merged))
- Sharding keys:
-
import_failures- Sharding keys:
group_id/project_id(Already exist in the table) - Note: Some rows have neither set, and are linked to the user instead so we need a fallback option
- MR: !168393 (merged)
- Sharding keys:
-
project_import_data- Sharding key
project_id(Already exists in the table) - MR: !168391 (merged)
- Sharding key
This involves choosing one of the following, based on the intended behaviour of the table:
-
The table is not cell-local
- Set
gitlab_schematogitlab_main_clusterwide.
- Set
-
The table is cell-local and requires a sharding key
- Set
gitlab_schematogitlab_main_cell - Add a
sharding_keyordesired_sharding_keyconfiguration. If the configuration is known but the chosen key doesn't yet meet not-null and foreign key requirements, you can add an exception toallowed_to_be_missing_not_nullorallowed_to_be_missing_foreign_keyto get the pipeline passing. Please link to a follow-up issue in a code comment next to the exception. - You may also need to set
allow_cross_joins,allow_cross_transactionsandallow_cross_foreign_keysif changing the schema causes pipeline failures. Seedb/docs/epics.ymlfor an example.
- Set
-
The table is cell-local and does not require a sharding key
- Set
gitlab_schematogitlab_main_celland - Set
exempt_from_shardingtotrue.
- Set
Documentation
Edited by Keeyan Nejad