Detect mis-categorization of gitlab_schema (gitlab_main_clusterwide / gitlab_main_cell)
There are around 80+ tables that are gitlab_main_clusterwide. A number of these were found to be mis-categorized, e.g. Re-classify bulk_imports as gitlab_main_cell wi... (#499829 - closed), and Move organizations to gitlab_main_cell (!170029 - merged).
- We should find these tables, so that if those tables needs sharding keys we can start that process earlier rather than sooner.
- There should exist tooling / testing to both detect and prevent such mis-categorization
Impacts
- (Cells 1.0) Tables mis-categorized as
gitlab_main_clusterwidethen it should begitlab_main_cellmean lost time to start sharding key, and backfill work. - (Cells 1.0) Too many tables categorized as
gitlab_main_clusterwidemeans that we have a big project to sync these tables to all cells, delaying the whole project. - (Cells 1.5) If there is dependency between
gitlab_main_clusterwide, andgitlab_main_celldata, this could introduce errors (e.g. FK violations), causing Org Mover to not work.
Proposed Criteria
- Small table. A table with many rows will be too slow to sync, especially on startup
- Low write activity. A table with high write activity may cause consistency issues.
- Data integrity. A clusterwide table must not reference a cell table.
- Reference table. References from cell table to a clusterwide table must be checked to see if there is split-brain, or not.
- Consistency. A clusterwide table must not use auto-increment sequences
- Seed tables. A clusterwide table should not insert any data during the seeding process.
Action items
-
Review proposed criteria -
Update docs in https://docs.gitlab.com/ee/development/database/multiple_databases.html#gitlab-schema, and https://docs.gitlab.com/ee/development/cells/#choose-either-the-gitlab_main_cell-or-gitlab_main_clusterwide-schema. -
Write tests that will run to detect possible issues with classification. -
Open issues for tables which are mis-categorized.
Edited by Thong Kuah