Organization migration: Geo verification state tables sharding key work
<!--IssueSummary start--> <details> <summary> Everyone can contribute. [Help move this issue forward](https://handbook.gitlab.com/handbook/marketing/developer-relations/contributor-success/community-contributors-workflows/#contributor-links) while earning points, leveling up and collecting rewards. </summary> - [Label this issue](https://contributors.gitlab.com/manage-issue?action=label&projectId=278964&issueIid=585368) </details> <!--IssueSummary end--> ## Problem Statement We are evaluating two options to migrate PostgreSQL data from the legacy cell to the Protocell during organization moves: 1. **Pre-move copy approach**: Copy PG data as a pre-move step to start Geo replication and reduce downtime, then put the organization in maintenance mode (read-only) and copy PG data again, waiting until Geo replication and verification reach 100%. 2. **AWS DMS approach**: Use AWS DMS for the initial full load of PG data, then start Geo replication to reduce downtime while keeping AWS DMS running for CDC (Change Data Capture), then put the organization in maintenance mode (read-only) and wait until Geo replication and verification reach 100%. ## Current Issue Geo verification state tables (all `*_states` tables owned by Geo) have been marked as `gitlab_main_cell_local` or `gitlab_ci_cell_local` in [MR gitlab!182030](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/182030/diffs). This means these tables won't be copied or replicated by either of the above approaches to the target cell (Protocell). However, the Protocell needs to read the checksum from these tables during the verification phase. ## Constraints - The Protocell will not have access to the legacy cell's database to read the checksum (per recent Org Mover design changes) - We do not currently have an API to provide this kind of information - We need a solution that doesn't require creating new APIs or changing how Geo currently does verification ## Proposed Solution Mark Geo verification state tables as `gitlab_main_org` instead of `gitlab_main_cell_local`/`gitlab_ci_cell_local`, with the following approach: 1. Define a sharding key on these tables (some may already have one or have a desired sharding key) 2. Copy/replicate these tables as part of the organization migration to the Protocell 3. This approach avoids: - Requiring access to the legacy cell's database - Creating a new API for checksum information - Changing how Geo currently does verification ## References - [Cells Architecture: Geo-leveraged migration process](https://handbook.gitlab.com/handbook/engineering/architecture/design-documents/cells/migration/#geo-leveraged-migration-process) - [Tenant Scale team discussion](https://gitlab.com/gitlab-com/gl-infra/tenant-scale/tenant-services/team/-/work_items/329#note_2990201762) - [MR gitlab!182030: Mark Geo verification state tables as cell-local](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/182030/diffs) ## Questions for Discussion 1. Is marking these tables as `gitlab_main_org` the right approach? 2. Which Geo verification state tables need sharding keys defined? 3. What are the implications of replicating these tables as part of organization migration? 4. Are there any performance or consistency concerns with this approach? 5. How does this affect the Geo verification workflow during organization moves? ## Related Work - Org Mover design changes - Cells architecture migration strategy - Geo replication and verification process <!--STATUS NOTE START--> <!--STATUS NOTE END-->
epic