Patroni cluster documentation queries

Problem to solve

Resolve snags in the Patroni cluster documentation

Further details

[1] References to `repmgr` when setting up the database cluster, main docs. In the context of setting up Patroni:

[a] a repmgr database

☑ gone

A database user is created with read-only access to the repmgr database.

Is any sort of databases needed for Patroni?

[b] An account called repmgr for PgBouncer

☑ gone

The service will have a regular database user account generated for it

This defaults to repmgr

Does it still have account created? New name?

[2] Reference Architectures

todo

source and also source (for Praefect)
rendered

Copy the /etc/gitlab/gitlab-secrets.json file from your Consul server, and replace the file of the same name on this server. If that file is not on this server, add the file from your Consul server to this server.

[a] Three Consul servers are built. There doesn't appear to be a step to sync them to each other, nor (as far as I can see) to supply any secrets in gitlab.rb.

Is this required? If it is, how does the cluster work with only the secrets off one arbitrary Consul node?

[b] Also, looking in the main docs for Patroni, it says (source)

- Passwords are stored in the following locations:
  - `/etc/gitlab/gitlab.rb`: hashed
  - `/var/opt/gitlab/pgbouncer/pg_auth`: hashed
  - `/var/opt/gitlab/consul/.pgpass`: plaintext

We should document WHY the secrets file is needed. If customers get malfunctions because their secrets get out of sync, the WHY will help troubleshoot this.

We've done this for Praefect: !58962 (diffs)

Omnibus GitLab installations can use gitlab-secrets.json for GITLAB_SHELL_SECRET_TOKEN.

[3] Port requirements - Consul

☑ done

Consul ports reference
Consul config reference (Search for "This is a nested object that allows setting the bind ports")

We only document 8300 (server) 8500 (http)

[a] Consul port 8301, serf_lan, is not included in port requirements (source)

In researching Consul port usage, I concluded that the clients and nodes locate each other use this port, and so in full would be:

consul['configuration'] = {
  retry_join: ['172.18.0.101:8301','172.18.0.102:8301','172.18.0.103:8301'],
}

I have built clusters configured this way - it works.

Citation: mention of configuring retry_join which uses SERFLAN

If you are unable to use auto-join, you can also follow the instructions in either of the auto-join sections but instead of using a provider key in the -retry-join flag, you would need to pass the address of at least one consul server, e.g: -retry-join=$CONSUL_SERVER_IP:$SERVER_SERFLAN_PORT.

[b] Consul docs seem to strongly advise allowing 8302 (serf_wan)

The Serf WAN port. Default 8302. Set to -1 to disable. Note: this will disable WAN federation which is not recommended. Various catalog and WAN related endpoints will return errors or empty results. TCP and UDP.

[c] Do we not use the dns port, 8600? (This being the case, it can be disabled with -1.)

[4] Port requirements / connection flow

todo

[a] The documentation implies both the Consul servers and agents need to accept incoming connections. Is this really the case?

Consul servers and agents connect to each others

[b] Patroni connectivity.

Port 8008 is listed as of interest.

Patroni actively manages the running PostgreSQL processes and configuration.

Patroni runs on the PostgreSQL servers, and I understand isn't interacted with directly on it's API from any other components.

For example, in the Patroni docs, a cluster is constructed by configuring HAProxy to monitoring Patroni on port 8008, and send traffic to PostgreSQL. This doesn't apply in our setup.

Do the Patroni services communicate with each other via their API?

[5] max WAL senders

☑ done

For example, max_wal_senders by default is set to 5. If you wish to change this you must set it with the patroni['postgresql']['max_wal_senders'] configuration key.

Just above, we set this (rendered, source)

In this document we are assuming 3 database nodes, which makes this configuration:
patroni['postgresql']['max_wal_senders'] = 4

It might be useful to document the default, but surely when we change it. Not in another place, without mention to the fact that we seem to recommend setting it explicitly.

This is clearer in the sample configuration

# Replace X with value of number of db nodes + 1 (OPTIONAL the default value is 5)

[6] PGBOUNCER_NODE

☑ done - gitlab_rails['db_host'] = 'PGBOUNCER_NODE' or 'INTERNAL_LOAD_BALANCER' - ref archs used load balancer

When using default setup, minimum configuration requires:

[..]

PGBOUNCER_NODE, is the IP address or a FQDN of the node running PgBouncer.

There's no mention of PGBOUNCER_NODE in the reference architectures, eg 50k

and the use in the singular is confusing given that it's likely folks will deploy multiple PgBouncers behind a load balancer.

Is it actually required, if not specified in the reference architecture procedures?

[7] repmgr is deprecated

PG12 is required from GL14, therefore repmgr is history. Some references to repmgr should be removed or revised

[a] a Note effectively about migrating from repmgr to Patroni

☑ ignoring or gone

Should this be in a 'migrating from repmgr to patroni section' rather than in the docs for setting up Patroni.

source

NOTE: The configuration of a Patroni node is very similar to a repmgr but shorter.

[..]

Then you can remove any repmgr[...] or repmgr-specific configuration as well.

Here is an example similar to the one that was done with repmgr:

[b] Similarly ..

☑ gone

source

NOTE: As opposed to repmgr, once the nodes are reconfigured you do not need any further action or additional command to join the replicas.

[8] PostgreSQL post-configuration "primary node"

☑ gone

SSH in to the primary node:

Is this a legacy repmgr term, should this be 'leader'? In which case, 'Check the status of the leader and cluster' probably needs to come first, as there's no way, without checking, to know which the leader is.

From our Patroni docs:

Patroni heavily relies on Consul to store the state of the cluster and elect a leader.

Alternatively perhaps this is no longer needed. The post-config steps appear to come from the original repmgr docs.

[9] PostgreSQL post-steps cannot be performed on a clean install

☑ gone

Open a database prompt:
gitlab-psql -d gitlabhq_production

# gitlab-psql -d gitlabhq_production
psql: error: FATAL:  database "gitlabhq_production" does not exist

The post-config steps appear to come from the original repmgr docs.

[10] required extensions

☑ gone

5k - source specifies just pg_trgm as does 10k, 25k, 50k
3k - source specifies pg_trgm and btree_gist

SSOT for the current requirements.

[11] bypassing pgbouncer

raised issue omnibus-gitlab#6204

https://docs.gitlab.com/ee/administration/postgresql/pgbouncer.html#procedure-for-bypassing-pgbouncer
required for database restores (https://docs.gitlab.com/ee/raketasks/backup_restore.html#backup-and-restore-for-installations-using-pgbouncer)
required for upgrades (missing from the upgrades docs - https://docs.gitlab.com/omnibus/update/README.html#use-postgresql-ha)

instructions are for repmgr.

Patroni actually provides a nice way to track the leader, and so it's possibly to create a load balancer entry for direct access to the primary,

patroni docs

Patroni provides an HAProxy configuration, which will give your application a single endpoint for connecting to the cluster’s leader. To configure, run:

Proposal

Revise docs.

Who can address the issue

Will ask for help from QA team and Distribution to address these questions.

Patroni cluster documentation queries

Problem to solve

Further details

[1] References to repmgr when setting up the database cluster, main docs. In the context of setting up Patroni:

[2] Reference Architectures

[3] Port requirements - Consul

[4] Port requirements / connection flow

[5] max WAL senders

[6] PGBOUNCER_NODE

[7] repmgr is deprecated

[8] PostgreSQL post-configuration "primary node"

[9] PostgreSQL post-steps cannot be performed on a clean install

[10] required extensions

[11] bypassing pgbouncer

Proposal

Who can address the issue

Other links/references

[1] References to `repmgr` when setting up the database cluster, main docs. In the context of setting up Patroni: