Move Database replication to the API layer
Creating this issue to discuss moving database replication to the API as a follow-up to the proposal from Josh
We currently rely on PostgreSQL streaming replication to replicate the database to Geo secondaries. This replicates the entire database and has no concept of organisations, groups or projects.
The streaming replication approach comes with some limitations:
- It does not work across vendors when using managed databases
- Makes migrating SM customers to GitLab Dedicated challenging and requires workarounds.
- Customers cannot have an on-prem PostgreSQL instance replicate to a managed instance and vice versa.
- It does not allow scoping to the organisation, group and project level.
- Cannot be used for cell mover scenarios where we want to migrate an organisation from one cell to another for load balancing or other reasons.
- It cannot be used for replicating data on multi-tenanted platforms such as cells and GitLab.com
- It's one of the limitations for implementing GitLab Federation and the use of Geo to address data compliance use cases. For example, We can selectively sync all data except DB data in Geo.
Benefits of moving to API layer:
- Replication can be performed at the organisation, project and group level enabling organisations to be moved between multi-tenanted cells.
- Allows Geo to fully restrict data replicated to secondary sites - Addresses some of the use cases related to data compliance and paves the way for Federation.
- Support some level of version skews, by leveraging an API layer as opposed to direct DB replication
- Supports migrating across cloud providers and is DB feature agnostic
- Makes migrating SM customers to GitLab Dedicated much easier
- Would potentially simplify Geo setup and configuration by eliminating the need to setup PostgreSQL replication.