Detect mis-categorization of gitlab_schema (gitlab_main_clusterwide / gitlab_main_cell)

There are around 80+ tables that are gitlab_main_clusterwide. A number of these were found to be mis-categorized, e.g. Re-classify bulk_imports as gitlab_main_cell wi... (#499829 - closed), and Move organizations to gitlab_main_cell (!170029 - merged).

  1. We should find these tables, so that if those tables needs sharding keys we can start that process earlier rather than sooner.
  2. There should exist tooling / testing to both detect and prevent such mis-categorization

Impacts

  1. (Cells 1.0) Tables mis-categorized as gitlab_main_clusterwide then it should be gitlab_main_cell mean lost time to start sharding key, and backfill work.
  2. (Cells 1.0) Too many tables categorized as gitlab_main_clusterwide means that we have a big project to sync these tables to all cells, delaying the whole project.
  3. (Cells 1.5) If there is dependency between gitlab_main_clusterwide, and gitlab_main_cell data, this could introduce errors (e.g. FK violations), causing Org Mover to not work.

Proposed Criteria

  • Small table. A table with many rows will be too slow to sync, especially on startup
  • Low write activity. A table with high write activity may cause consistency issues.
  • Data integrity. A clusterwide table must not reference a cell table.
  • Reference table. References from cell table to a clusterwide table must be checked to see if there is split-brain, or not.
  • Consistency. A clusterwide table must not use auto-increment sequences
  • Seed tables. A clusterwide table should not insert any data during the seeding process.

Action items

Edited by Thong Kuah