POC of Geo Protocell Mode
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem/Proposal
We had a couple ideas for how to use Geo mostly as-is for migrating data to a Protocell. (See Meeting Notes below.)
In this issue, we should do a POC of the Protocells mode idea, timeboxed to 2 days.
Meeting notes
From Notes - Org Data Migration - Sync/Office Hours:
Sep 18, 2025 | [REC] Org Data Migration Sync
- Michael Kozono: Geo will probably need some modifications to configure a Protocell as a secondary site while the Protocell is a live, writable site at the same time. I haven’t looked at specifically what modifications will be needed and how much work that will be. I think I need to do a POC locally.
- Michael Kozono There was the idea to run Rails processes on the side that have different configuration (configured as Geo secondary site)
- Douglas Alexandre What about Protocells mode, distinct from primary and secondary site. Runs all Geo jobs
- Michael Kozono What about just normal jobs + Geo secondary site jobs?
- Douglas Alexandre I think it’s fine to run the primary checksum jobs
- Douglas Alexandre We need to modify CronManager
- Douglas Alexandre Need to deploy tracking DB
- Douglas Alexandre Geo Health status. Not streaming replication, logical replication.
Plan
Partially configure Geo. (Protocell to act as secondary Geo site of Legacy Cell without any PG replication and without breaking anything.) Fix/bandaid things along the way, potentially introduce a third type of Geo site: Protocells mode. Stop at timebox of 2 days.
- Legacy Cell: Set geo_node_name
- Legacy Cell:
rake geo:set_primary_node
- Legacy Cell: In UI, add a secondary site with the Protocell attributes
- Legacy Cell: Dump
geo_nodes
table - Protocell: Set geo_node_name
- Protocell: Insert
geo_nodes
rows
Replicate org 1. Bandaid things along the way. Stop at timebox of 2 days.
- Legacy Cell: Block org 1 users
- Legacy Cell: Set selective sync by org 1 (starts checksumming non-PG data)
- Legacy Cell: Dump PG data for org 1
- Protocell: Insert PG data for org 1
- Protocell: Set selective sync by org 1
- Wait for replication of non-PG data
Ideas
Protocells mode would be relevant at call sites of Gitlab::Geo.primary?
and Gitlab::Geo.secondary?
. We could add Gitlab::Geo.protocell?
.