Adding organization_id as sharding key could lead to issues with moving group to another Org

We have a problem. Adding organization_id to a cell-local table which does not have the possibility of the usual sharding key candidates - project_id and namespace_id.

Example: !152816 (comment 1915548731). These tables, in a sense, violate organization isolation as they are de-duplication tables that function across all groups and projects. Options:

  1. Add organization_id. Make it easy for Org mover, harder to move group to another Org. Some duplication
    1. Mitigation: We need functionality anyway to split organization
  2. Add namespace_id: increases duplicate data
  3. Add project_id: increases duplicate data
  4. Not adding anything increases Org mover complexity and Org mover downtime
  5. Make this table cluster-wide. Many-to-many replication.

Tables that seem, or will have organization_id:

  • analytics_cycle_analytics_stage_event_hashes
  • projects
  • namespaces
  • topics
  • packages_dependencies
  • ci_runners
  • personal_access_tokens

/cc @ayufan @manojmj

Action items

  • Add to cells sharding development guidelines on when can organization_id be used as sharding key. See #463768 (comment 1999146361) for suggestions
Edited by Thong Kuah