Patroni cluster documentation queries
Problem to solve
Resolve snags in the Patroni cluster documentation
Further details
repmgr
when setting up the database cluster, main docs. In the context of setting up Patroni:
[1] References to [a] a repmgr
database
A database user is created with read-only access to the
repmgr
database.
Is any sort of databases needed for Patroni?
[b] An account called repmgr
for PgBouncer
The service will have a regular database user account generated for it
- This defaults to
repmgr
Does it still have account created? New name?
[2] Reference Architectures
todo
Copy the
/etc/gitlab/gitlab-secrets.json
file from your Consul server, and replace the file of the same name on this server. If that file is not on this server, add the file from your Consul server to this server.
[a] Three Consul servers are built. There doesn't appear to be a step to sync them to each other, nor (as far as I can see) to supply any secrets in gitlab.rb
.
Is this required? If it is, how does the cluster work with only the secrets off one arbitrary Consul node?
[b] Also, looking in the main docs for Patroni, it says (source)
- Passwords are stored in the following locations:
- `/etc/gitlab/gitlab.rb`: hashed
- `/var/opt/gitlab/pgbouncer/pg_auth`: hashed
- `/var/opt/gitlab/consul/.pgpass`: plaintext
We should document WHY the secrets file is needed. If customers get malfunctions because their secrets get out of sync, the WHY will help troubleshoot this.
We've done this for Praefect: !58962 (diffs)
Omnibus GitLab installations can use
gitlab-secrets.json
forGITLAB_SHELL_SECRET_TOKEN
.
[3] Port requirements - Consul
- Consul ports reference
- Consul config reference (Search for "This is a nested object that allows setting the bind ports")
We only document 8300 (server
) 8500 (http
)
[a] Consul port 8301, serf_lan
, is not included in port requirements (source)
In researching Consul port usage, I concluded that the clients and nodes locate each other use this port, and so in full would be:
consul['configuration'] = {
retry_join: ['172.18.0.101:8301','172.18.0.102:8301','172.18.0.103:8301'],
}
I have built clusters configured this way - it works.
Citation: mention of configuring retry_join which uses SERFLAN
If you are unable to use auto-join, you can also follow the instructions in either of the auto-join sections but instead of using a
provider
key in the-retry-join
flag, you would need to pass the address of at least one consul server, e.g:-retry-join=$CONSUL_SERVER_IP:$SERVER_SERFLAN_PORT
.
[b] Consul docs seem to strongly advise allowing 8302 (serf_wan
)
- The Serf WAN port. Default 8302. Set to -1 to disable. Note: this will disable WAN federation which is not recommended. Various catalog and WAN related endpoints will return errors or empty results. TCP and UDP.
[c] Do we not use the dns
port, 8600? (This being the case, it can be disabled with -1
.)
[4] Port requirements / connection flow
todo
[a] The documentation implies both the Consul servers and agents need to accept incoming connections. Is this really the case?
Consul servers and agents connect to each others
[b] Patroni connectivity.
Port 8008 is listed as of interest.
Patroni actively manages the running PostgreSQL processes and configuration.
Patroni runs on the PostgreSQL servers, and I understand isn't interacted with directly on it's API from any other components.
For example, in the Patroni docs, a cluster is constructed by configuring HAProxy to monitoring Patroni on port 8008, and send traffic to PostgreSQL. This doesn't apply in our setup.
Do the Patroni services communicate with each other via their API?
[5] max WAL senders
For example,
max_wal_sender
s by default is set to5
. If you wish to change this you must set it with thepatroni['postgresql']['max_wal_senders']
configuration key.
Just above, we set this (rendered, source)
In this document we are assuming 3 database nodes, which makes this configuration:
patroni['postgresql']['max_wal_senders'] = 4
It might be useful to document the default, but surely when we change it. Not in another place, without mention to the fact that we seem to recommend setting it explicitly.
This is clearer in the sample configuration
# Replace X with value of number of db nodes + 1 (OPTIONAL the default value is 5)
[6] PGBOUNCER_NODE
gitlab_rails['db_host'] = 'PGBOUNCER_NODE' or 'INTERNAL_LOAD_BALANCER'
- ref archs used load balancer
When using default setup, minimum configuration requires:
[..]
PGBOUNCER_NODE
, is the IP address or a FQDN of the node running PgBouncer.
There's no mention of PGBOUNCER_NODE
in the reference architectures, eg 50k
and the use in the singular is confusing given that it's likely folks will deploy multiple PgBouncers behind a load balancer.
Is it actually required, if not specified in the reference architecture procedures?
[7] repmgr is deprecated
PG12 is required from GL14, therefore repmgr is history. Some references to repmgr should be removed or revised
[a] a Note
effectively about migrating from repmgr to Patroni
Should this be in a 'migrating from repmgr to patroni section' rather than in the docs for setting up Patroni.
NOTE: The configuration of a Patroni node is very similar to a repmgr but shorter.
[..]
Then you can remove any
repmgr[...]
or repmgr-specific configuration as well.Here is an example similar to the one that was done with repmgr:
[b] Similarly ..
NOTE: As opposed to repmgr, once the nodes are reconfigured you do not need any further action or additional command to join the replicas.
[8] PostgreSQL post-configuration "primary node"
SSH in to the primary node:
Is this a legacy repmgr term, should this be 'leader'? In which case, 'Check the status of the leader and cluster' probably needs to come first, as there's no way, without checking, to know which the leader is.
From our Patroni docs:
Patroni heavily relies on Consul to store the state of the cluster and elect a leader.
Alternatively perhaps this is no longer needed. The post-config steps appear to come from the original repmgr
docs.
[9] PostgreSQL post-steps cannot be performed on a clean install
Open a database prompt:
gitlab-psql -d gitlabhq_production
# gitlab-psql -d gitlabhq_production
psql: error: FATAL: database "gitlabhq_production" does not exist
The post-config steps appear to come from the original repmgr
docs.
[10] required extensions
-
5k - source specifies just
pg_trgm
as does 10k, 25k, 50k -
3k - source specifies
pg_trgm
andbtree_gist
SSOT for the current requirements.
[11] bypassing pgbouncer
raised issue omnibus-gitlab#6204
-
required for database restores (https://docs.gitlab.com/ee/raketasks/backup_restore.html#backup-and-restore-for-installations-using-pgbouncer)
-
required for upgrades (missing from the upgrades docs - https://docs.gitlab.com/omnibus/update/README.html#use-postgresql-ha)
instructions are for repmgr.
Patroni actually provides a nice way to track the leader, and so it's possibly to create a load balancer entry for direct access to the primary,
Patroni provides an HAProxy configuration, which will give your application a single endpoint for connecting to the cluster’s leader. To configure, run:
Proposal
Revise docs.
Who can address the issue
Will ask for help from QA team and Distribution to address these questions.