Implementing Patroni in omnibus
Patroni itself is a Python package that's can be easily installed by a
pip command. Still, one of its dependencies, psycopg2, installs shared libraries with unsatisfied dependencies, which breaks the omnibus build. It is the same case in our production installation, although it didn't break anything, yet!
For fresh installations, ideally Patroni would create the Postgres cluster itself (i.e. running
initdb, relevant config), but it also can attach itself to a running Postgres instance, however it tries to manage that instance afterwards (i.e. HUP-ing, restarting, ...) so it will always conflict with the runit service we create for Postgres.
Patroni takes control of
postgresql.conf, moving it to
postgresql.base.conf then it includes (i.e. using
include directive) the latter into the former. It can provision
pg_hba.conf (relevant config) but it will skip it if we provide a path for it in the Postgres parameters config section.
Patroni expects configurations for credentials of a superuser and a replication user, it can create them by itself if a different piece of configuration is present.
Patroni requires a Consul agent (relevant config) to be running on all members of the cluster. Once the Patroni process starts, it checks for an existing cluster leader or assumes leadership. In the former case, it will fetch a base backup from the master then continues with streaming replication, Patroni takes care of creating replication slots. The leader/master needs to have the relevant
pg_hba entries to allow replication connections.
There are global configurations for the Patroni cluster that are stored in Consul upon the cluster initialization, subsequent changes to them in
patroni.yml doesn't update them in Consul, they need to be updated either through Patroni API or through
patronictl edit-config (we use the latter in production).
In production we use an internal load balancer in front of all the Patroni cluster, with only one node is marked as healthy that is the master. The load balancer pings all Patroni HTTP endpoints, from which only one returns 200 (the master) and the rest return 503s (the replicas). Patroni docs suggests using HAProxy to the same effect, perhaps nginx could be used as a replacement but we didn't experiment with either of them.
We are trying to move away from using a load balancer to using pgbouncer + Consul instead, details can be found in gitlab-com/www-gitlab-com!18844