Alternative Geo scheduler
This approach describes an architecture where Geo uses as simple replication strategy as I can think of. Migration would not require too many efforts I think. And it does not require changing everything. The main benefit here is that we don't need to compare a huge data sets.
Primary data design
The table geo_event_log stores all the events, including all the uploaded files, LFS and so on. It should not include any assets stored on the external object storage.
Secondary data design
Table sync_registy:
| id | last_processed_event_id | last_processed_lfs_object_id | last_processed_upload_id | last_processed_project_id |
|---|---|---|---|---|
| 1 | 456 | 23 | 56 | 23 |
failures table, the design isn't important for this document, something like existing registry table.
Streaming replication
As we have an access to geo_event_log table on each secondary server we can replay new events. The position is tracked using column sync_registy.last_processed_event_id.
Retry
All the event IDs that have failed and have to be retried are stored locally in the tracking database. Also, some events can be collapsed, for example, repository_update does not have to be put in this failures table twice as it does not make any sense.
Backfill strategy
We have created_at field for every secondary that means that we know what items have to be backfilled by looking at its created_at column. For example, to retrieve all the projects that have to be backfilled we request them using the clause WHERE projects.created_at <= #{current_node_created_at}. We need to fetch ordered by ID list so we can use sync_registry.last_processed_project_id
So the clause above becomes WHERE projects.created_at <= #{current_node_created_at} AND projects.id > #{last_processed_project_id}
All the failures should be treated as for streaming replication.
Pruning geo_event_log table
All the old events that is not referenced by any configured geo nodes have to be pruned. We can request last processed event_id by API from every secondary node. Actually, this is how it works now.
Event collapsing
Events can be grouped by type to not replay few updates when we really need one of them. Only repositories can benefit as files are idempotent.
Problems
- We need to implement some approach to prevent the situation when the sync has failed but have not been put to
failurestable. I think there are many ways to do that.