PostgreSQL replication didn't work with Patroni as loopback was required in trust_auth_cidr_addresses
Summary
Errors after restarting the Patroni nodes, broadly:
FATAL: no pg_hba.conf entry for replication connection from host "127.0.0.1", user "gitlab_replicator"
Which I conclude is the reason for the other error:
FATAL: could not receive data from WAL stream: ERROR: requested starting point 0/5000000 is ahead of the WAL flush position of this server 0/41803F0
I don't think the cluster was actually replicating prior to the restart, as the these errors were already occurring. Resolved by reinitializing the secondaries
Logging this as an issue in case these arise outside of our testing.
Fix
in gitlab.rb
add 127.0.0.1/32
to postgresql['trust_auth_cidr_addresses']
postgresql['trust_auth_cidr_addresses'] = ['172.18.0.0/24','127.0.0.1/32']
# or
postgresql['trust_auth_cidr_addresses'] = %w(172.18.0.0/24 127.0.0.1/32)
then run gitlab-ctl reconfigure
Steps to reproduce
I have a Patroni cluster running in Docker using the Omnibus images.
This assembled fine, and then after restarting the three Patroni containers, I found that replication wasn't working.
for i in 1 2 3 ; do docker restart patroni$i ; done
# curl -vs http://172.18.0.112:8008/cluster | jq .
* About to connect() to 172.18.0.112 port 8008 (#0)
* Trying 172.18.0.112...
* Connected to 172.18.0.112 (172.18.0.112) port 8008 (#0)
> GET /cluster HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 172.18.0.112:8008
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: BaseHTTP/0.6 Python/3.7.10
< Date: Tue, 08 Jun 2021 07:57:40 GMT
< Content-Type: application/json
<
{ [data not shown]
* Closing connection 0
{
"members": [
{
"name": "patroni1",
"role": "leader",
"state": "running",
"api_url": "http://172.18.0.111:8008/patroni",
"host": "172.18.0.111",
"port": 5432,
"timeline": 2
},
{
"name": "patroni2",
"role": "replica",
"state": "running",
"api_url": "http://172.18.0.112:8008/patroni",
"host": "172.18.0.112",
"port": 5432,
"lag": 56
},
{
"name": "patroni3",
"role": "replica",
"state": "running",
"api_url": "http://172.18.0.113:8008/patroni",
"host": "172.18.0.113",
"port": 5432,
"lag": 0
}
]
}
The leader has not changed, and it was the leader at shutdown:
2021-06-08 07:39:13,765 INFO: Lock owner: patroni1; I am patroni1
2021-06-08 07:39:13,768 INFO: no action. i am the leader with the lock
2021-06-08 07:39:23,777 INFO: Lock owner: patroni1; I am patroni1
2021-06-08 07:39:23,799 INFO: no action. i am the leader with the lock
LOG: received fast shutdown request
LOG: aborting any active transactions
FATAL: terminating connection due to administrator command
LOG: background worker "logical replication launcher" (PID 350) exited with exit code 1
LOG: shutting down
LOG: database system is shut down
LOG: started streaming WAL from primary at 0/5000000 on timeline 2
FATAL: could not receive data from WAL stream: ERROR: requested starting point 0/5000000 is ahead of the WAL flush position of this server 0/4000388
FATAL: no pg_hba.conf entry for replication connection from host "127.0.0.1", user "gitlab_replicator", SSL on
FATAL: no pg_hba.conf entry for replication connection from host "127.0.0.1", user "gitlab_replicator", SSL off
2021-06-08 07:46:06,740 ERROR: Can not fetch local timeline and lsn from replication connection
Traceback (most recent call last):
File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/__init__.py", line 737, in get_replica_timeline
with self.get_replication_connection_cursor(**self.config.local_replication_address) as cur:
File "/opt/gitlab/embedded/lib/python3.7/contextlib.py", line 112, in __enter__
return next(self.gen)
File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/__init__.py", line 732, in get_replication_connection_cursor
with get_connection_cursor(**conn_kwargs) as cur:
File "/opt/gitlab/embedded/lib/python3.7/contextlib.py", line 112, in __enter__
return next(self.gen)
File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/connection.py", line 43, in get_connection_cursor
with psycopg2.connect(**kwargs) as conn:
File "/opt/gitlab/embedded/lib/python3.7/site-packages/psycopg2-2.8.6-py3.7-linux-x86_64.egg/psycopg2/__init__.py", line 127, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL: no pg_hba.conf entry for replication connection from host "127.0.0.1", user "gitlab_replicator", SSL on
FATAL: no pg_hba.conf entry for replication connection from host "127.0.0.1", user "gitlab_replicator", SSL off
FATAL: no pg_hba.conf entry for replication connection from host "127.0.0.1",
For Patroni, it is not documented that the loopback is required, unlike repmgr. This came up in another issue
Tried adding it to all three:
- update content in file /var/opt/gitlab/postgresql/data/pg_hba.conf from 94a1d6 to 3a7100
--- /var/opt/gitlab/postgresql/data/pg_hba.conf 2021-06-06 13:56:20.585654101 +0000
+++ /var/opt/gitlab/postgresql/data/.chef-pg_hba20210608-2707-1is63n6.conf 2021-06-08 08:10:56.828252334 +0000
@@ -72,6 +72,8 @@
host all all 172.18.0.0/24 trust
host replication gitlab_replicator 172.18.0.0/24 trust
+host all all 127.0.0.1/32 trust
+host replication gitlab_replicator 127.0.0.1/32 trust
After, the traceback and the fatal errors about the loopback are gone.
2021-06-08 08:13:30,525 INFO: Lock owner: patroni1; I am patroni3
2021-06-08 08:13:30,525 INFO: does not have lock
2021-06-08 08:13:30,533 INFO: no action. i am a secondary and i am following a leader
LOG: started streaming WAL from primary at 0/5000000 on timeline 2
FATAL: could not receive data from WAL stream: ERROR: requested starting point 0/5000000 is ahead of the WAL flush position of this server 0/41803F0
LOG: started streaming WAL from primary at 0/5000000 on timeline 2
FATAL: could not receive data from WAL stream: ERROR: requested starting point 0/5000000 is ahead of the WAL flush position of this server 0/41803F0
2021-06-08 08:13:40,613 INFO: Lock owner: patroni1; I am patroni3
2021-06-08 08:13:40,614 INFO: does not have lock
2021-06-08 08:13:40,627 INFO: no action. i am a secondary and i am following a leader
What is the current bug behavior?
Is the loopback required in trusted CIDR range when running Patroni?
I thought I did everything in the implementation instructions, but nothing alerted me to te cluster not being synced.
What is the expected correct behavior?
No errors about the loopback
Cluster build instructions include a step that would highlight that the cluster isn't actually in sync. If there's no writes to the database, it seems the 'lag' field isn't helpful.
# gitlab-ctl patroni members help
+ Cluster: postgresql-ha (6970678148837286213) ------+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------------------------------+--------------+---------+---------+----+-----------+
| gitlab-testdbc1-patroni1.watertower | 172.18.0.111 | Leader | running | 2 | |
| gitlab-testdbc1-patroni2.watertower | 172.18.0.112 | Replica | running | 2 | 0 |
| gitlab-testdbc1-patroni3.watertower | 172.18.0.113 | Replica | running | 2 | 0 |
+-------------------------------------+--------------+---------+---------+----+-----------+
Relevant logs
as above
Details of package version
# docker ps | egrep 'IMAGE|patroni' | cut -b -50
CONTAINER ID IMAGE
641b49c5075f gitlab/gitlab-ee:13.12.2-ee.0
75fc2fbb5b80 gitlab/gitlab-ee:13.12.2-ee.0
954abe2cf779 gitlab/gitlab-ee:13.12.2-ee.0
Environment details
gitlab/gitlab-ee:13.12.2-ee.0
container (Ubuntu) running in docker on Centos 7.
Configuration details
Provide the relevant sections of `/etc/gitlab/gitlab.rb`
# grep -v password patroni*/config/gitlab.rb patroni1/config/gitlab.rb:roles ['postgres_role'] patroni1/config/gitlab.rb: patroni1/config/gitlab.rb:consul['configuration'] = { patroni1/config/gitlab.rb: retry_join: ['172.18.0.101:8301','172.18.0.102:8301','172.18.0.103:8301'], patroni1/config/gitlab.rb: server: false, patroni1/config/gitlab.rb:} patroni1/config/gitlab.rb: patroni1/config/gitlab.rb:# Enable Patroni (which automatically disables Repmgr). patroni1/config/gitlab.rb:patroni['enable'] = true patroni1/config/gitlab.rb: patroni1/config/gitlab.rb:# PostgreSQL configuration patroni1/config/gitlab.rb:postgresql['listen_address'] = '0.0.0.0' patroni1/config/gitlab.rb:postgresql['trust_auth_cidr_addresses'] = ['172.18.0.0/24','127.0.0.1/32'] patroni1/config/gitlab.rb: patroni1/config/gitlab.rb:# Configure the Consul agent patroni1/config/gitlab.rb:consul['services'] = ['postgresql'] patroni1/config/gitlab.rb: patroni1/config/gitlab.rb: patroni1/config/gitlab.rb:# to prevent replication from using up all of the available database connections patroni1/config/gitlab.rb:# set to one more than the number of PG servers patroni1/config/gitlab.rb:patroni['postgresql']['max_wal_senders'] = 4 patroni1/config/gitlab.rb:patroni['postgresql']['max_replication_slots'] = 4 patroni1/config/gitlab.rb: patroni1/config/gitlab.rb:# disable prometheus patroni1/config/gitlab.rb:prometheus_monitoring['enable'] = false patroni1/config/gitlab.rb:# logrotate not needed, sv handles consul patroni1/config/gitlab.rb:logrotate['enable'] = false patroni1/config/gitlab.rb:# Disable auto migrations patroni1/config/gitlab.rb:gitlab_rails['auto_migrate'] = false#--
patroni2/config/gitlab.rb:roles ['postgres_role'] patroni2/config/gitlab.rb: patroni2/config/gitlab.rb:consul['configuration'] = { patroni2/config/gitlab.rb: retry_join: ['172.18.0.101:8301','172.18.0.102:8301','172.18.0.103:8301'], patroni2/config/gitlab.rb: server: false, patroni2/config/gitlab.rb:} patroni2/config/gitlab.rb: patroni2/config/gitlab.rb:# Enable Patroni (which automatically disables Repmgr). patroni2/config/gitlab.rb:patroni['enable'] = true patroni2/config/gitlab.rb: patroni2/config/gitlab.rb:# PostgreSQL configuration patroni2/config/gitlab.rb:postgresql['listen_address'] = '0.0.0.0' patroni2/config/gitlab.rb:postgresql['trust_auth_cidr_addresses'] = ['172.18.0.0/24','127.0.0.1/32'] patroni2/config/gitlab.rb: patroni2/config/gitlab.rb:# Configure the Consul agent patroni2/config/gitlab.rb:consul['services'] = ['postgresql'] patroni2/config/gitlab.rb: patroni2/config/gitlab.rb: patroni2/config/gitlab.rb:# to prevent replication from using up all of the available database connections patroni2/config/gitlab.rb:# set to one more than the number of PG servers patroni2/config/gitlab.rb:patroni['postgresql']['max_wal_senders'] = 4 patroni2/config/gitlab.rb:patroni['postgresql']['max_replication_slots'] = 4 patroni2/config/gitlab.rb: patroni2/config/gitlab.rb:# disable prometheus patroni2/config/gitlab.rb:prometheus_monitoring['enable'] = false patroni2/config/gitlab.rb:# logrotate not needed, sv handles consul patroni2/config/gitlab.rb:logrotate['enable'] = false patroni2/config/gitlab.rb:# Disable auto migrations patroni2/config/gitlab.rb:gitlab_rails['auto_migrate'] = false
#--
patroni3/config/gitlab.rb:roles ['postgres_role'] patroni3/config/gitlab.rb: patroni3/config/gitlab.rb:consul['configuration'] = { patroni3/config/gitlab.rb: retry_join: ['172.18.0.101:8301','172.18.0.102:8301','172.18.0.103:8301'], patroni3/config/gitlab.rb: server: false, patroni3/config/gitlab.rb:} patroni3/config/gitlab.rb: patroni3/config/gitlab.rb:# Enable Patroni (which automatically disables Repmgr). patroni3/config/gitlab.rb:patroni['enable'] = true patroni3/config/gitlab.rb: patroni3/config/gitlab.rb:# PostgreSQL configuration patroni3/config/gitlab.rb:postgresql['listen_address'] = '0.0.0.0' patroni3/config/gitlab.rb:postgresql['trust_auth_cidr_addresses'] = ['172.18.0.0/24','127.0.0.1/32'] patroni3/config/gitlab.rb: patroni3/config/gitlab.rb:# Configure the Consul agent patroni3/config/gitlab.rb:consul['services'] = ['postgresql'] patroni3/config/gitlab.rb: patroni3/config/gitlab.rb: patroni3/config/gitlab.rb:# to prevent replication from using up all of the available database connections patroni3/config/gitlab.rb:# set to one more than the number of PG servers patroni3/config/gitlab.rb:patroni['postgresql']['max_wal_senders'] = 4 patroni3/config/gitlab.rb:patroni['postgresql']['max_replication_slots'] = 4 patroni3/config/gitlab.rb: patroni3/config/gitlab.rb:# disable prometheus patroni3/config/gitlab.rb:prometheus_monitoring['enable'] = false patroni3/config/gitlab.rb:# logrotate not needed, sv handles consul patroni3/config/gitlab.rb:logrotate['enable'] = false patroni3/config/gitlab.rb:# Disable auto migrations patroni3/config/gitlab.rb:gitlab_rails['auto_migrate'] = false