Patroni requires a superuser with password to execute pg_rewind
For some operations, patroni relies on the configured superuser
to execute some operations before running pg_rewind.
Current cookbook defines the superuser as:
authentication:
superuser:
username: gitlab-psql
This is the type of error you see when Patroni needs to pg_rewind but can't:
2021-01-15_06:42:03.58713 2021-01-15 06:42:03,586 INFO: running pg_rewind from remote_master:c702da0c-56fc-11eb-9dab-42010aa40038
2021-01-15_06:42:03.60123 2021-01-15 06:42:03,600 ERROR: Exception during CHECKPOINT
2021-01-15_06:42:03.60136 Traceback (most recent call last):
2021-01-15_06:42:03.60144 File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/__init__.py", line 520, in checkpoint
2021-01-15_06:42:03.60152 with get_connection_cursor(**connect_kwargs) as cur:
2021-01-15_06:42:03.60159 File "/opt/gitlab/embedded/lib/python3.7/contextlib.py", line 112, in __enter__
2021-01-15_06:42:03.60166 return next(self.gen)
2021-01-15_06:42:03.60173 File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/connection.py", line 43, in get_connection_cursor
2021-01-15_06:42:03.60181 with psycopg2.connect(**kwargs) as conn:
2021-01-15_06:42:03.60188 File "/opt/gitlab/embedded/lib/python3.7/site-packages/psycopg2-2.8.6-py3.7-linux-x86_64.egg/psycopg2/__init__.py", line 127, in connect
2021-01-15_06:42:03.60195 conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
2021-01-15_06:42:03.60203 psycopg2.OperationalError: fe_sendauth: no password supplied
2021-01-15_06:42:03.60211
2021-01-15_06:42:03.60217 2021-01-15 06:42:03,600 WARNING: Can not use remote_master:c702da0c-56fc-11eb-9dab-42010aa40038 for rewind: not accessible or not healty
When looking at the leader, you will see that it is lacking something in pg_hba:
2021-01-15_06:43:43.60810 FATAL: password authentication failed for user "gitlab-psql"
2021-01-15_06:43:43.60833 DETAIL: User "gitlab-psql" has no password assigned.
2021-01-15_06:43:43.60843 Connection matched pg_hba.conf line 92: "host all all 10.164.0.56/32 md5"
2021-01-15_06:43:48.23390 2021-01-15 06:43:48,233 INFO: Lock owner: gabriel-patroni-primary-geo.c.group-geo-f9c951.internal; I am gabriel-patroni-primary-geo.c.group-geo-f9c951.internal
2021-01-15_06:43:48.25345 2021-01-15 06:43:48,253 INFO: no action. i am the leader with the lock
If you try to set the wrong password you will see this on the Patroni trying to pg_rewind:
2021-01-15_06:38:38.08361 2021-01-15 06:38:38,083 INFO: running pg_rewind from remote_master:4c813ec2-56fc-11eb-9fed-42010aa40038
2021-01-15_06:38:38.09808 2021-01-15 06:38:38,097 ERROR: Exception during CHECKPOINT
2021-01-15_06:38:38.09813 Traceback (most recent call last):
2021-01-15_06:38:38.09814 File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/__init__.py", line 520, in checkpoint
2021-01-15_06:38:38.09815 with get_connection_cursor(**connect_kwargs) as cur:
2021-01-15_06:38:38.09816 File "/opt/gitlab/embedded/lib/python3.7/contextlib.py", line 112, in __enter__
2021-01-15_06:38:38.09817 return next(self.gen)
2021-01-15_06:38:38.09818 File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/connection.py", line 43, in get_connection_cursor
2021-01-15_06:38:38.09819 with psycopg2.connect(**kwargs) as conn:
2021-01-15_06:38:38.09822 File "/opt/gitlab/embedded/lib/python3.7/site-packages/psycopg2-2.8.6-py3.7-linux-x86_64.egg/psycopg2/__init__.py", line 127, in connect
2021-01-15_06:38:38.09823 conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
2021-01-15_06:38:38.09824 psycopg2.OperationalError: FATAL: password authentication failed for user "gitlab-psql"
2021-01-15_06:38:38.09825 FATAL: password authentication failed for user "gitlab-psql"
2021-01-15_06:38:38.09826
2021-01-15_06:38:38.09827 2021-01-15 06:38:38,097 WARNING: Can not use remote_master:4c813ec2-56fc-11eb-9fed-42010aa40038 for rewind: not accessible or not healty
additionally the user needs to have a "database" with the same name (this doesn't look like is configurable by the patroni check).
here is how it shows up on the log, considering it can connect (by using the workaround below):
2021-01-15_06:54:41.35868 2021-01-15 06:54:41,358 INFO: running pg_rewind from remote_master:8aae2b9a-56fe-11eb-9532-42010aa40038
2021-01-15_06:54:41.37094 2021-01-15 06:54:41,370 ERROR: Exception during CHECKPOINT
2021-01-15_06:54:41.37110 Traceback (most recent call last):
2021-01-15_06:54:41.37117 File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/__init__.py", line 520, in checkpoint
2021-01-15_06:54:41.37125 with get_connection_cursor(**connect_kwargs) as cur:
2021-01-15_06:54:41.37132 File "/opt/gitlab/embedded/lib/python3.7/contextlib.py", line 112, in __enter__
2021-01-15_06:54:41.37139 return next(self.gen)
2021-01-15_06:54:41.37148 File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/connection.py", line 43, in get_connection_cursor
2021-01-15_06:54:41.37156 with psycopg2.connect(**kwargs) as conn:
2021-01-15_06:54:41.37162 File "/opt/gitlab/embedded/lib/python3.7/site-packages/psycopg2-2.8.6-py3.7-linux-x86_64.egg/psycopg2/__init__.py", line 127, in connect
2021-01-15_06:54:41.37176 conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
2021-01-15_06:54:41.37187 psycopg2.OperationalError: FATAL: database "gitlab-psql" does not exist
2021-01-15_06:54:41.37195
2021-01-15_06:54:41.37202 2021-01-15 06:54:41,370 WARNING: Can not use remote_master:8aae2b9a-56fe-11eb-9532-42010aa40038 for rewind: not accessible or not healty
Proposal
We either need to set a password for gitlab-psql or create a special gitlab_admin
user with a password, similar to how we configure gitlab_replicator
.
This will then be set instead of gitlab-psql
into the superuser
credentials.
This new user will have an empty database created using the same name for the purpose of running the CHECKPOINT.
Workaround
The existing workaround, that is being used during tests, is to add the hosts to trust
in pg_hba, so password is not used. This is obviously not safe for production, so we need to fix the situation above.