Alternatives to FDW / tracking databases for GitLab Geo

In %10.0, GitLab Geo introduced the log cursor and a separate PostgreSQL tracking database (as Geo secondaries have a read-only copy of the main database, but need to store information about whether a project or file has been synced). Some more details on that here: https://docs.gitlab.com/ee/development/geo.html

We have to compare the tracking database to the main database in a number of cases:

Scheduling backfill for file and project replication
Calculating percentage completion of file and project replication

Originally, we performed these comparisons by selecting every project ID or every file ID, pulling them into Ruby, and then comparing that data with the tracking database's state.

We attempted to improve this by using Postgresql's Foreign Data Wrappers support. This allows us to perform conceptually nicer queries, with the serialization/deserialization/comparison costs being held in the postgresql database (and c/c++ code) rather than in ruby, which should be nicer.

However, we've not seen fundamentally huge leaps in performance. https://gitlab.com/gitlab-org/gitlab-ee/issues/4548#note_54369807 discusses the situation for scheduling backfill, and calculating percentage completion depends on an fdw feature ("aggregate pushdown") which is only available in postgresql 10.0, so we don't benefit there either.

I'd like to think that there is a non-fdw, non-pull-all-the-ids-into-ruby solution for this problem. Being able to remove all the FDW code from gitlab and omnibus would be a great simplification (albeit, high-churn). Being able to remove the tracking database entirely would be even better.

The only option I can think of that would satisfy both right now is to have the secondaries read from their local, read-only copy of the database, but write tracking information to the postgresql master - which is on the primary, in a separate datacentre, of course.

So we'd have a geo_project_registry table in the main database, which might look like:

create_table :geo_project_registry
  t.references :geo_node, null: false
  t.references :project, null: false
  ...

The same would apply for the file registry, of course.

Is this a feasible option? I know we have a similar postgresql setup in HA (including for gitlab.com) using repmgr, and we're talking about switching to repmgr to simplify setup and make ~"Geo DR" with multiple nodes possible.

We've also talked about enabling UI actions on the secondary by having it make API calls to the primary: https://gitlab.com/gitlab-org/gitlab-ee/issues/3764 but if we could proxy database writes in this way, we could do this much more easily.

Other options?

/cc @stanhu @brodock @to1ne @_stark @dbalexandre @digitalmoksha @ibaum

Edited Jan 12, 2018 by Nick Thomas