Add sharding key for pool_repositories
Sharding keys need to be set for the tables: keys, This involves choosing one of the following, based on the intended behaviour of the table:
-
The table is not cell-local
- Set
gitlab_schematogitlab_main_clusterwide.
- Set
-
The table is cell-local and requires a sharding key
- Set
gitlab_schematogitlab_main_cell - Add a
sharding_keyordesired_sharding_keyconfiguration. If the configuration is known but the chosen key doesn't yet meet not-null and foreign key requirements, you can add an exception toallowed_to_be_missing_not_nullorallowed_to_be_missing_foreign_keyto get the pipeline passing. Please link to a follow-up issue in a code comment next to the exception. - You may also need to set
allow_cross_joins,allow_cross_transactionsandallow_cross_foreign_keysif changing the schema causes pipeline failures. Seedb/docs/epics.ymlfor an example.
- Set
-
The table is cell-local and does not require a sharding key
- Set
gitlab_schematogitlab_main_cell_localand - No foreign key references to/from organization tables
- Set
Documentation
- Choosing either the gitlab_main_cell or gitlab_main_clusterwide schema
- Defining a sharding key for all cell-local tables
- Defining a desired_sharding_key to automatically backfill a sharding_key
Summary
This issue has many comments and has changed directions a couple of times. Here's my attempt at a summary:
- The work was started, ran into a few data migration issues, stopped, changed direction, and started again. That's what makes following the threads below difficult.
- We have ultimately decided we do want to add an organization_id and backfill it.
- The work is picked up for 18.5 to 18.6 (has to be split across two releases)
- We have orphaned data on the
pool_repositoriestable where we don't know how to get the organization_id. We are OK with setting the orphaned data to organization_id 1 for now, and figuring out how we can trace back to a proper organization id via Gitaly at a later date.- Gitaly slack discussion: https://gitlab.slack.com/archives/C3ER3TQBT/p1759325108230759
- Related issue for the Gitaly aspect here: https://gitlab.com/gitlab-org/gitlab/-/issues/573591
- Setting to organization_id 1 discussion: https://gitlab.slack.com/archives/C0609EXHX6F/p1759348241265409
- We discussed the possibility of deleting orphaned data but we are not doing it at the moment, see organization_id discussion for more details.
- Gitaly slack discussion: https://gitlab.slack.com/archives/C3ER3TQBT/p1759325108230759
- There is existing work that can be leveraged for the migration: !181158 (diffs) (thank you @olaoluro for this, it will be very useful).
Edited by Hunter Stewart