GEO patroni member is listed as "pending restart" and WAL control functions cannot be executed during recovery
GEO setup
We have constructed a GEO setup with gitlab v18.1.1, and tracking DB is constructed on the primary postgresql at the secondary site. GitLab primary and secondary sites are available and gitlab service for primary and secondary replica are available with different domain URL.
In the primary postgresql instance at the secondary site, two postgresql services are running
Postgresql as a member of the Patroni cluster
/opt/gitlab/embedded/bin/postgres -D /var/opt/gitlab/postgresql/data --config-file=/var/opt/gitlab/postgresql/data/postgresql.conf --listen_addresses=0.0.0.0 --port=5432 --cluster_name=postgresql-ha --wal_level=replica --hot_standby=on ...
Tracking DB for GEO secondary site is also running
/opt/gitlab/embedded/postgresql/16/bin/postgres -D /var/opt/gitlab/geo-postgresql/data
Describe the wrong behaviour
After deploying GEO setup, patroni members at secondary site are still showing with a status "pending restart".
secondary-site:postgresql-1> gitlab-ctl patroni members
+ Cluster: postgresql-ha ------------------------------------------------------------------+
| Member | Host | Role | State | TL | Lag in MB | Pending restart |
+----------------+-----------+----------------+---------+----+-----------+-----------------+
| ges-postgres-1 | 10.7.0.21 | Replica | running | 27 | 0 | * |
| ges-postgres-2 | 10.7.0.22 | Replica | running | 27 | 5 | * |
| ges-postgres-3 | 10.7.0.23 | Standby Leader | running | 27 | | * |
+----------------+-----------+----------------+---------+----+-----------+-----------------+
On the GEO-secondary site, PostgreSQL logs (/var/log/gitlab/patroni/current) shows unavailable recovery with associated WAL control function in every 15 sec by the standby leader of the Patroni cluster.
2025-07-08_09:04:09.63008 2025-07-08 09:04:09,629 INFO: no action. I am (ges-postgres-3), the standby leader with the lock
2025-07-08_09:04:18.80030 ERROR: recovery is in progress
2025-07-08_09:04:18.80033 HINT: WAL control functions cannot be executed during recovery.
2025-07-08_09:04:18.80034 STATEMENT:
2025-07-08_09:04:18.80034 SELECT slot_name, database, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
2025-07-08_09:04:18.80034 FROM pg_replication_slots
Expected behavior
After the PostgreSQL servers as a member of patroni is reconfigured/restarted we would expect the DCS state to no longer reflect a "pending restart" for every patroni cluster member on secondary site.
No recovery failure would be stated in the log session at all.