At the moment, omnibus-gitlab users that wish to setup Gitaly HA via Praefect must figure out themselves how to setup its database. The gitlab.rb template points to hostnames postgres.external and postgres.internal, which are not setup by omnibus-gitlab. Two routes users can take:
Point Praefect to use the Postgresql cluster provisioned by omnibus-gitlab: This is the simplest setup. However, we explicitly discourage doing this if using Geo, because of replication consequences.
Provision a separate Postgresql cluster to use with Praefect: This is the recommended configuration with Geo, but this is not possible/obvious with omnibus-gitlab, leaving the user to setup Postgresql (and perhaps Patroni/PgBouncer) all on their own
Point Praefect to use the Postgresql cluster provisioned by omnibus-gitlab: This is the simplest setup. However, we explicitly discourage doing this if using Geo, because of replication consequences.
I don't think this is true, @toon as our resident Geo expert, could you verify and maybe update the documentation? I'm fairly certain that at least our QA setup uses 1 DB for everything.
@zj-gitlab It's discouraged, but certainly possible to.
Let me add a little drawing:
graph BT subgraph Primary G1(Praefect) --> psql1[(Primary<br>PostgreSQL<br>-`rails`<br>-`praefect`)] end subgraph Secondary psql2[(Secondary<br>PostgreSQL<br>-`rails`<br>-`praefect`)] G2(Praefect) --> psql2b[(Geo tracking<br>PostgreSQL<br>-`praefect`)] end psql2 -->|PostgreSQL replication| psql1
Having PostgreSQL replication (in most cases streaming replication) set up between the Geo Primary and Geo Secondary has these consequences:
All databases in the cluster are replicated: gitlabhq_production for rails and praefect_production for praefect
The databases are read-only
The second consequence makes it impossible to use the main postgresql instance managed by omnibus for Praefect on the Geo Secondary. An alternative approach (drawn here) is to store the praefect database in the postgresql_geo instance managed by omnibus. We actually do a similar thing in GDK. But due to the first consequence, the Praefect state of the Geo primary is replicated to the secondary. This is completely useless, wasteful of storage and network bandwidth, and even dangerous if Praefect on the secondary is configured to use that database.
@toon Maybe a pragmatic approach would be to create a database for Praefect per Geo area? E.g. we'd have a praefect_prod_eu_ams and a praefect_prod_us_ny. That would work if we can point a Praefect to a specific database?
What are other options? Can we tell PG not to sync a certain table, and have each zone write their own data?
Maybe a pragmatic approach would be to create a database for Praefect per Geo area? E.g. we'd have a praefect_prod_eu_ams and a praefect_prod_us_ny. That would work if we can point a Praefect to a specific database?
@zj-gitlab I don't see how that helps. If you put them in postgresql they'd be read-only in all Geo secondary instances, if you'd put them in postgresql-geo they won't be replicated across Geo-locations anyway, so there's no need use unique names.
What are other options?
We've been breaking our head over this before, but I don't know.
At the moment Geo uses streaming replication (PSQL built-in) or archiving replication (using Wall-G for example). Both techniques replicate the whole cluster (i.e. all databases) and make the replica read-only. An alternative is to use logical replication. It allows you to be selective on databases and tables to synchronize, and it keeps the replica writable. But it's harder to implement. For example it does not replicate DDL schema, so you'd need to be meticulously do upgrades in the correct order. At Geo we've been looking into this option, but so far stayed away from it, because it was tricky.
Can we tell PG not to sync a certain table, and have each zone write their own data?
So with current techniques: no.
Shocking new proposal
It suddenly hit me, why don't we always run the praefect_production database on postgresql-geo? Consider postgresql-geo the PG instance that bound to that particular Geo-location (and not the GitLab Geo feature). It will make it simple, because the location praefect_production is always the same for Geo Primary and Secondary, and data ain't replicated to Geo Secondary nodes. I'm not sure how easy it is atm to run postgresql-geo on a non-Geo node, or a Geo-primary. But it must be doable.
But that does not fix all issues, but the Geo tracking DB has no ability to be set up in a HA setup. I've been told the Geo team is working on this (gitlab#229061 (closed)), but might take a few months before it's complete. And the issue@grantyoung opened, will also apply here.
I'm not sure how easy it is atm to run postgresql-geo on a non-Geo node, or a Geo-primary. But it must be doable.
I don't know either. This may also become useful if customers want to switch their primary location around based on business requirements e.g. every three months.
But that does not fix all issues, but the Geo tracking DB has no ability to be set up in a HA setup. I've been told the Geo team is working on this (gitlab#229061 (closed)), but might take a few months before it's complete. And the issue@grantyoung opened, will also apply here.
Yes, once Patroni works reliably, this should be possible.
@mjwood Can you please have a look at my "Shocking new proposal" above? I think it's an important decision we need to make for the direction of embedding the Praefect database in omnibus.
I noticed groupdistribution is working on adding the Praefect database (see !5252) to the main postgresql cluster and this proposal is pretty much incompatible with that.
I think the geo-postgres idea seems good. We already have it tooled to be able to run on the same node as the regular db, or on its own node, and it already has its own patroni docs now. It seems like enabling that path for praefect in omnibus would be a lot easier than re-doing all that work for praefect.
The cons are that it would introduce a shared ownership of db config for that database. So any changes need to work for both the gitaly team and the geo team.
@toon - Sorry for the late reply. I think it makes a lot of sense to go with your proposal as I think it helps to solve some of the issues groupdistribution is working through as well.
I think @toon's proposal is interesting. It probably makes sense to maintain the possibility of running each database in a separate instance, but defaulting to a shared instance, and defaulting that to enabled whenever Geo and/or Praefect is enabled seems feasible.
Large Premium customer is interested in this, in order to switch a non-Geo, but scaled PG system (3 pgbouncer + LB + 3 repmgrd) to the new Praefect DB (internal ticket).