Skip to content

Patroni requires a superuser with password to execute pg_rewind

For some operations, patroni relies on the configured superuser to execute some operations before running pg_rewind.

Current cookbook defines the superuser as:

  authentication:
    superuser:
      username: gitlab-psql

This is the type of error you see when Patroni needs to pg_rewind but can't:

2021-01-15_06:42:03.58713 2021-01-15 06:42:03,586 INFO: running pg_rewind from remote_master:c702da0c-56fc-11eb-9dab-42010aa40038
2021-01-15_06:42:03.60123 2021-01-15 06:42:03,600 ERROR: Exception during CHECKPOINT
2021-01-15_06:42:03.60136 Traceback (most recent call last):
2021-01-15_06:42:03.60144   File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/__init__.py", line 520, in checkpoint
2021-01-15_06:42:03.60152     with get_connection_cursor(**connect_kwargs) as cur:
2021-01-15_06:42:03.60159   File "/opt/gitlab/embedded/lib/python3.7/contextlib.py", line 112, in __enter__
2021-01-15_06:42:03.60166     return next(self.gen)
2021-01-15_06:42:03.60173   File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/connection.py", line 43, in get_connection_cursor
2021-01-15_06:42:03.60181     with psycopg2.connect(**kwargs) as conn:
2021-01-15_06:42:03.60188   File "/opt/gitlab/embedded/lib/python3.7/site-packages/psycopg2-2.8.6-py3.7-linux-x86_64.egg/psycopg2/__init__.py", line 127, in connect
2021-01-15_06:42:03.60195     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
2021-01-15_06:42:03.60203 psycopg2.OperationalError: fe_sendauth: no password supplied
2021-01-15_06:42:03.60211
2021-01-15_06:42:03.60217 2021-01-15 06:42:03,600 WARNING: Can not use remote_master:c702da0c-56fc-11eb-9dab-42010aa40038 for rewind: not accessible or not healty

When looking at the leader, you will see that it is lacking something in pg_hba:

2021-01-15_06:43:43.60810 FATAL:  password authentication failed for user "gitlab-psql"
2021-01-15_06:43:43.60833 DETAIL:  User "gitlab-psql" has no password assigned.
2021-01-15_06:43:43.60843       Connection matched pg_hba.conf line 92: "host    all         all         10.164.0.56/32           md5"
2021-01-15_06:43:48.23390 2021-01-15 06:43:48,233 INFO: Lock owner: gabriel-patroni-primary-geo.c.group-geo-f9c951.internal; I am gabriel-patroni-primary-geo.c.group-geo-f9c951.internal
2021-01-15_06:43:48.25345 2021-01-15 06:43:48,253 INFO: no action.  i am the leader with the lock

If you try to set the wrong password you will see this on the Patroni trying to pg_rewind:

2021-01-15_06:38:38.08361 2021-01-15 06:38:38,083 INFO: running pg_rewind from remote_master:4c813ec2-56fc-11eb-9fed-42010aa40038
2021-01-15_06:38:38.09808 2021-01-15 06:38:38,097 ERROR: Exception during CHECKPOINT
2021-01-15_06:38:38.09813 Traceback (most recent call last):
2021-01-15_06:38:38.09814   File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/__init__.py", line 520, in checkpoint
2021-01-15_06:38:38.09815     with get_connection_cursor(**connect_kwargs) as cur:
2021-01-15_06:38:38.09816   File "/opt/gitlab/embedded/lib/python3.7/contextlib.py", line 112, in __enter__
2021-01-15_06:38:38.09817     return next(self.gen)
2021-01-15_06:38:38.09818   File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/connection.py", line 43, in get_connection_cursor
2021-01-15_06:38:38.09819     with psycopg2.connect(**kwargs) as conn:
2021-01-15_06:38:38.09822   File "/opt/gitlab/embedded/lib/python3.7/site-packages/psycopg2-2.8.6-py3.7-linux-x86_64.egg/psycopg2/__init__.py", line 127, in connect
2021-01-15_06:38:38.09823     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
2021-01-15_06:38:38.09824 psycopg2.OperationalError: FATAL:  password authentication failed for user "gitlab-psql"
2021-01-15_06:38:38.09825 FATAL:  password authentication failed for user "gitlab-psql"
2021-01-15_06:38:38.09826
2021-01-15_06:38:38.09827 2021-01-15 06:38:38,097 WARNING: Can not use remote_master:4c813ec2-56fc-11eb-9fed-42010aa40038 for rewind: not accessible or not healty

additionally the user needs to have a "database" with the same name (this doesn't look like is configurable by the patroni check).

here is how it shows up on the log, considering it can connect (by using the workaround below):

2021-01-15_06:54:41.35868 2021-01-15 06:54:41,358 INFO: running pg_rewind from remote_master:8aae2b9a-56fe-11eb-9532-42010aa40038
2021-01-15_06:54:41.37094 2021-01-15 06:54:41,370 ERROR: Exception during CHECKPOINT
2021-01-15_06:54:41.37110 Traceback (most recent call last):
2021-01-15_06:54:41.37117   File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/__init__.py", line 520, in checkpoint
2021-01-15_06:54:41.37125     with get_connection_cursor(**connect_kwargs) as cur:
2021-01-15_06:54:41.37132   File "/opt/gitlab/embedded/lib/python3.7/contextlib.py", line 112, in __enter__
2021-01-15_06:54:41.37139     return next(self.gen)
2021-01-15_06:54:41.37148   File "/opt/gitlab/embedded/lib/python3.7/site-packages/patroni/postgresql/connection.py", line 43, in get_connection_cursor
2021-01-15_06:54:41.37156     with psycopg2.connect(**kwargs) as conn:
2021-01-15_06:54:41.37162   File "/opt/gitlab/embedded/lib/python3.7/site-packages/psycopg2-2.8.6-py3.7-linux-x86_64.egg/psycopg2/__init__.py", line 127, in connect
2021-01-15_06:54:41.37176     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
2021-01-15_06:54:41.37187 psycopg2.OperationalError: FATAL:  database "gitlab-psql" does not exist
2021-01-15_06:54:41.37195
2021-01-15_06:54:41.37202 2021-01-15 06:54:41,370 WARNING: Can not use remote_master:8aae2b9a-56fe-11eb-9532-42010aa40038 for rewind: not accessible or not healty

Proposal

We either need to set a password for gitlab-psql or create a special gitlab_admin user with a password, similar to how we configure gitlab_replicator.

This will then be set instead of gitlab-psql into the superuser credentials.

This new user will have an empty database created using the same name for the purpose of running the CHECKPOINT.

Workaround

The existing workaround, that is being used during tests, is to add the hosts to trust in pg_hba, so password is not used. This is obviously not safe for production, so we need to fix the situation above.

Related upstream issues

Edited by Gabriel Mazetto