Organization Data Migration Feature Parity Phase 1 - Sharding and Operational Improvements
## Problem Statement
Legacy Cell holds all the data of the organizations. We have [identified a set of cohorts](https://gitlab.com/gitlab-com/gl-infra/mstaff/-/issues/474) we want to migrate out of the Legacy Cell to a Protocell to permanently remove the load on the database in the Legacy Cell.
We need some kind of internal tooling so that we can migrate multiple types of data (Git, Blob, Container Repository, Database) that is scoped to an organization. We should strive for no data loss and a short window of downtime for that organization.
Phase 1 of [Organization Data Migration](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/1626) established the foundation for organization migration by implementing selective sync by organization for multiple data types, moving verification details into separate tables and introducing Geo Protocells Mode into the GDK.
**Phase 1 focuses on 4 key areas:**
1. **Geo Verification State Tables** (Epic: Geo verification state tables migration strategy)
* Mark verification state tables as `gitlab_main_org` instead of cell-local
* Define sharding keys on \~20 state tables
* Enable these tables to be replicated during organization migration
* Swap foreign keys to loose foreign keys for `lfs_objects_projects`
2. **Migration Readiness & Testing** (Epic: First customer organization move)
* Implement full E2E org move test in GDK
* Identify valuable metrics pre-move and during cutover
* Assist with move tests on production
* Fix bugs found during production moves
3. **Scalability & Operational Improvements**
* Job artifact metrics collection must scale (bucketed counting approach)
* Data compatibility tooling (check if org has data Geo cannot migrate)
* Background migration handling during org moves
### Exit Criteria
- [ ] Assist and support efforts to have test organization migrated successfully on staging
- [ ] ~~Support test organization migrated successfully on production~~ - stretch goal
- [x] Finalize strategy around handling background migrations
- [ ] Identify metrics pre and post move and during cutover
- [ ] Fix bugs found during test organization moves on staging
- [ ] Complete verification table sharding work
- [ ] Add E2E Test and add Geo with Protocells Mode in GDK
### Participants
* @mkozono
* @nsilva5
* @dbalexandre
* @s_murray
## Migration Process Overview
Per the design document, the Geo-leveraged migration follows these phases:
1. Pre-move replication - Geo continuously replicates non-PG data while organization is active
2. Quiesce organization - Archive root group, add maintenance page, drain requests jobs
3. Copy PG data - Extract and insert organization-scoped data with FK validation
4. Finalize non-PG data - Wait until Geo replication and verification reaches 100%
5. Validate - Compare filtered table checksums between cells
6. Switchover - Update Topology Service routing, remove maintenance page, unarchive group
7. Post-move - Re-run in-progress background migrations on destination cell
### Dependencies
* https://gitlab.com/groups/gitlab-org/-/epics/17388+ (FF) - An organization is not safe to migrate if it is not known to be isolated.
* https://gitlab.com/gitlab-org/gitlab/-/issues/534565+ (FF) - An organization is not safe to migrate if writes are occurring to its data. We may be able to replace this dependency with some other ~"group::organizations" effort which can be used to stop an organization's data from mutating while we migrate.
<!-- STATUS NOTE START -->
## Status 2026-02-26
:clock1: **total hours spent this week by all contributors**: 56
:tada: **achievements**:
- Sharding key work 10/19 issues resolved: https://gitlab.com/groups/gitlab-org/-/epics/20487+
- 2 were resolved this week: https://gitlab.com/gitlab-org/gitlab/-/issues/587549+ and https://gitlab.com/gitlab-org/gitlab/-/issues/587554+ with NOT NULL constraints added to verification state tables
- Foundational MR merged for Org Migration Target mode: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/223454+
:arrow_forward: **next**:
- New epic opened: https://gitlab.com/groups/gitlab-org/-/epics/20933+ to track each upload partition separately for replication (25 child issues created with phased implementation strategy)
- Complete and merge 3 in-review sharding key MRs: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/224803+, https://gitlab.com/gitlab-org/gitlab/-/merge_requests/224563+, https://gitlab.com/gitlab-org/gitlab/-/merge_requests/223102+
- Continue project repository replication implementation https://gitlab.com/gitlab-org/gitlab/-/issues/577735+ (currently 3/10 tasks complete) to unblock 4 dependent repository types in https://gitlab.com/groups/gitlab-org/-/epics/18601+
- Merge boilerplate generator script https://gitlab.com/gitlab-org/gitlab/-/merge_requests/224164+ to accelerate implementation of 22 upload partition replicators in https://gitlab.com/groups/gitlab-org/-/epics/20933+
- Complete AbuseReport uploads POC https://gitlab.com/gitlab-org/gitlab/-/merge_requests/224245+ to validate pattern for https://gitlab.com/groups/gitlab-org/-/epics/20933+
- Resolve LFS architecture decision for https://gitlab.com/gitlab-org/gitlab/-/issues/587556+
- Continue work on https://gitlab.com/groups/gitlab-org/-/epics/17308+ (see child epic for details)
_Copied from https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/1817#note_3114183490_
<!-- STATUS NOTE END -->
epic