Mid/long term direction for Geo using federation
I wasn't sure where to put this, but I needed a place to dump some thoughts.
I feel we're on a point where we need/can decide which direction we want to go with ~Geo, architecture-wise. Geo currently works as a single read/write primary with multiple read-only secondary nodes. Although we allow now to Push-to-secondary (which is proxied or redirected), we still do not have fully writable secondary nodes.
We can continue on making the secondary nodes more user-friendly, and more writable. For example by implementing https://gitlab.com/gitlab-org/gitlab-ee/issues/3764. But it that we way we want to go?
Also, I see other problems pop up, that we cannot solve with the current architecture:
- Full selective sync, also on database level:
- The other day a large Geo customer asked for this
- This might also help to enable Geo on gitlab.com, gradually. Cause the current implementation of selective sync might not scale very well
- We've been looking into Logical replication in https://gitlab.com/gitlab-org/gitlab-ee/issues/7420:
- When reading that conversation, it doesn't seem that well help us in anyway, and might just complicate things more
- Also Logical Decoding isn't making things easier
- Another proposal was using Materialized views https://gitlab.com/gitlab-org/gitlab-ee/issues/5398:
- So far that also doesn't seem to be a silver bullet
Possible direction
Hybrid synchronization https://gitlab.com/gitlab-org/gitlab-ee/issues/623 might help us solving the problems above. Although I feel it's hard to accomplish that with the current architecture.
So maybe we should move the direction of Geo toward https://gitlab.com/gitlab-org/gitlab-ee/issues/4517.
But that's scary, and it might involve to building Geo again, from the ground up, but with a different mindset.
It can involve in building on top of https://gitlab.com/gitlab-org/gitlab-ce/issues/4013, something can could be build on https://github.com/forgefed/forgefed, which seem to have quite some interest in the FOSS community.
There is epic &260. And the ~Create is already building features that involve mailing patches to a GitLab instance.
Motivation
- Having a writable node at very Geo location will reduce write latency, and can handle downtime of the primary
- Setup would be easier: Each node in the Geo cluster is setup identically, without any need for a tracking database, streaming replication, etc.
- A DR scenario will also be easier. The "distibuted cluster" can lose any node and still can remain running. Promoting a secondary now to a primary is very cumbersome.
Motivations to keep current Geo architecture
Although there might be motivations to change direction, there are also reasons to keep the current architecture:
- Having a SSOT means there is never conflict resolution needed
- Relying on PostgreSQL replication is very reliable