Geo: Track each upload partition separately for replication and verification
## Summary
This epic tracks the work to implement Geo replication and verification for each individual upload partition table, rather than using the single `upload_states` table.
## Background
As discussed in [!221773](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/221773), specifically in [this comment](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/221773#note_3056075106) by @mkozono and [this follow-up](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/221773#note_3062604622) by @dbalexandre, the team has decided to track each upload partition separately.
### Why track each upload partition separately?
**Pros:**
- Improve performance of Geo queries when there are millions of uploads
- Reduce the friction to add new Geo data types generally
- Increase consistency and reliability
- Does not need required stops
- Anticipates the future work to [stop using the uploads table at all](https://gitlab.com/gitlab-org/gitlab/-/work_items/425484)
**Cons:**
- Requires more time invested upfront (boilerplate code for each data type)
- The Geo sites dashboard will need changes to support many more data types (tracked in [Iteration 4 in Geo Observability Phase 2](https://gitlab.com/groups/gitlab-org/-/work_items/16588))
---
## Phased Delivery Plan
### Phase 1: Foundation & First Replicator (POC)
**Goal:** Validate the approach with a single, low-risk partition
| Issue | Purpose |
|-------|---------|
| #589925 | Reduce SSF boilerplate for upload partition replicators |
| #589901 | `AbuseReport` uploads - **First replicator** |
**Exit Criteria:**
- [ ] Replicator pattern established and documented
- [ ] First partition replicating successfully on staging
- [ ] Performance baseline captured
### Phase 2: High-Volume Core Partitions
**Goal:** Tackle the most impactful partitions early to surface scaling issues
| Issue | Table | Rationale |
|-------|-------|-----------|
| #589915 | `project_uploads` | Highest volume |
| #589910 | `namespace_uploads` | Group-level, high usage |
| #589918 | `user_uploads` | User avatars, widespread |
| #589909 | `design_management_action_uploads` | Large files |
**Exit Criteria:**
- [ ] High-volume partitions performing well under load
- [ ] No degradation in Geo sync times
### Phase 3: Import/Export & Bulk Operations
**Goal:** Handle partitions critical for disaster recovery workflows
| Issue | Table |
|-------|-------|
| #589911 | `import_export_upload_uploads` |
| #589906 | `bulk_import_export_upload_uploads` |
| #589916 | `project_import_export_relation_export_upload_uploads` |
| #589919 | `user_permission_export_upload_uploads` |
**Exit Criteria:**
- [ ] Import/export workflows tested end-to-end with Geo
- [ ] Bulk migration scenarios validated
### Phase 4: Security & Compliance Partitions
**Goal:** Ensure vulnerability and compliance data replicates correctly
| Issue | Table |
|-------|-------|
| #589920 | `vulnerability_archive_export_uploads` |
| #589921 | `vulnerability_export_uploads` |
| #589922 | `vulnerability_export_part_uploads` |
| #589923 | `vulnerability_remediation_uploads` |
| #589907 | `dependency_list_export_uploads` |
| #589908 | `dependency_list_export_part_uploads` |
**Exit Criteria:**
- [ ] Security/compliance data integrity verified
### Phase 5: Remaining Partitions (Long Tail)
**Goal:** Complete coverage of all upload types
| Issue | Table |
|-------|-------|
| #589902 | `achievement_uploads` |
| #589903 | `ai_vectorizable_file_uploads` |
| #589904 | `alert_management_alert_metric_image_uploads` |
| #589905 | `appearance_uploads` (sharding key TBD) |
| #589912 | `issuable_metric_image_uploads` |
| #589913 | `organization_detail_uploads` |
| #589914 | `snippet_uploads` |
| #589917 | `project_topic_uploads` |
**Exit Criteria:**
- [ ] All 23 partition replicators implemented
- [ ] Full test coverage
### Phase 6: Switchover & Deprecation
**Goal:** Migrate from `upload_states` to partitioned tables
| Issue | Purpose |
|-------|---------|
| #589924 | Switch from uploads table to partitioned upload tables |
**Activities:**
1. Feature flag rollout (% ramp)
2. Dual-write period for verification
3. Deprecate `upload_states` table usage
4. Update Geo Observability dashboard (coordinate with [&16588](https://gitlab.com/groups/gitlab-org/-/work_items/16588))
**Exit Criteria:**
- [ ] 100% traffic on partitioned tables
- [ ] Legacy `upload_states` deprecated
- [ ] Documentation updated
---
## Phase Summary
| Phase | Issues | Focus | Risk |
|-------|--------|-------|------|
| 1 | 2 | Foundation + POC | Low |
| 2 | 4 | High-volume partitions | **High** |
| 3 | 4 | Import/Export | Medium |
| 4 | 6 | Security/Compliance | Medium |
| 5 | 8 | Long tail | Low |
| 6 | 1 | Switchover | **High** |
| **Total** | **25** | | |
## Key Risks & Mitigations
| Risk | Mitigation |
|------|------------|
| Performance regression on high-volume partitions | Phase 2 tackles these early; establish baselines in Phase 1 |
| Dashboard overwhelm (23+ new data types) | Coordinate with [&16588](https://gitlab.com/groups/gitlab-org/-/work_items/16588) before Phase 6 |
| `appearance_uploads` sharding key TBD | Resolve in Phase 5; low volume, can defer |
| Switchover data integrity | Dual-write period in Phase 6 |
---
## Child Issues Reference
| Model | Table Name | Sharding Key | Issue |
|-------|------------|--------------|-------|
| `AbuseReport` | `abuse_report_uploads` | `organization_id` | #589901 |
| `Achievements::Achievement` | `achievement_uploads` | `namespace_id` | #589902 |
| `Ai::VectorizableFile` | `ai_vectorizable_file_uploads` | `project_id` | #589903 |
| `AlertManagement::MetricImage` | `alert_management_alert_metric_image_uploads` | `project_id` | #589904 |
| `Appearance` | `appearance_uploads` | `TBD` | #589905 |
| `BulkImports::ExportUpload` | `bulk_import_export_upload_uploads` | `project_id` | #589906 |
| `Dependencies::DependencyListExport` | `dependency_list_export_uploads` | `organization_id, namespace_id, project_id` | #589907 |
| `Dependencies::DependencyListExport::Part` | `dependency_list_export_part_uploads` | `organization_id` | #589908 |
| `DesignManagement::Action` | `design_management_action_uploads` | `namespace_id` | #589909 |
| `Group` | `namespace_uploads` | `namespace_id` | #589910 |
| `ImportExportUpload` | `import_export_upload_uploads` | `project_id` | #589911 |
| `IssuableMetricImage` | `issuable_metric_image_uploads` | `namespace_id` | #589912 |
| `Organizations::OrganizationDetail` | `organization_detail_uploads` | `organization_id` | #589913 |
| `PersonalSnippet` | `snippet_uploads` | `organization_id` | #589914 |
| `Project` | `project_uploads` | `project_id` | #589915 |
| `Projects::ImportExport::RelationExportUpload` | `project_import_export_relation_export_upload_uploads` | `project_id` | #589916 |
| `Projects::Topic` | `project_topic_uploads` | `organization_id` | #589917 |
| `User` | `user_uploads` | `organization_id` | #589918 |
| `UserPermissionExportUpload` | `user_permission_export_upload_uploads` | `uploaded_by_user_id` | #589919 |
| `Vulnerabilities::ArchiveExport` | `vulnerability_archive_export_uploads` | `project_id` | #589920 |
| `Vulnerabilities::Export` | `vulnerability_export_uploads` | `organization_id` | #589921 |
| `Vulnerabilities::Export::Part` | `vulnerability_export_part_uploads` | `organization_id` | #589922 |
| `Vulnerabilities::Remediation` | `vulnerability_remediation_uploads` | `vulnerability_remediation_uploads` | #589923 |
## Related Links
- MR: [!221773](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/221773) - Add sharding key information to Geo upload_states table
- [Stop using the uploads table](https://gitlab.com/gitlab-org/gitlab/-/work_items/425484)
- [Geo Observability Phase 2 - Iteration 4](https://gitlab.com/groups/gitlab-org/-/work_items/16588)
epic