Geo: Implement LFS object replication for organization migration
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem
As part of organization migration from Legacy Cell to Protocell, the Geo team needs to implement special handling for LFS object replication. The lfs_objects table is now marked as cell-local and won't be replicated directly, requiring custom logic to ensure LFS objects are properly available on the target cell.
Background
The overall LFS migration strategy involves:
-
✅ Copyoidfromlfs_objectstolfs_object_projects(groupsource code #490482 (closed)) -
✅ Marklfs_objectstable as cell local (completed) -
🔄 Migratelfs_object_projectsrows and nullifylfs_object_id(Geo team - this issue) -
🔄 Ensure LFS objects exist on target cell and link them (Geo team - this issue) -
✅ Recycle unusedlfs_objectson source cell (existing functionality)
Tasks for Geo Team
Task 1: Migrate lfs_object_projects without lfs_object_id during PG data replication
When migrating an organization:
- Migrate
lfs_object_projectsrows for the organization (as normal) - Nullify (or otherwise exclude from replication) the
lfs_object_idreference during migration (this is the special part) - Do NOT migrate
lfs_objectsrows (as usual for cell-local tables)
Task 2: Implement LFS object deduplication on target cell
After migrating lfs_object_projects rows:
- For each row in
lfs_object_projects WHERE lfs_object_id IS NULL- Upsert an
lfs_objectsrow with theoid, returning ID (atomic to avoid race conditions) - Set the
lfs_object_projectslfs_object_idfield - Insert
lfs_object_registryto make the rest of Geo do its thing
- Upsert an
Also:
-
Ensure there are sufficient automated (unit? QA?) tests to meet GitLab's standards -
Smoke test locally
Related
- Blocked by Add sharding key for `lfs_objects` (#490482 - closed) because
lfs_object_projects.oidneeds to be backfilled and maintained in order for us to do this - Somewhat blocked by Organization migration: Replicate PostgreSQL data (&18462) or at least some kind of Postgres replication for cells in GDK POC of Geo Protocell Mode (#571916) because we need to do Task 1 on top of Postgres replication.
Edited by Michael Kozono