Geo: Database system identifier differs between the primary and standby
Summary
When setting up a Geo deployment using Patroni we are running into the issue
2020-12-07_15:04:06.05456 DETAIL: The primary's identifier is 6903524667333487661, the standby's identifier is 6903523308768856671.
2020-12-07_15:04:11.05145 FATAL: database system identifier differs between the primary and standby
This looks to be caused by the secondary site already being a part of an existing cluster and expecting the primary site to be using the same identifier. This is because both sites work independently before we add them to the same Geo deployment.
Currently the workaround steps are:
gitlab-ctl stop patroni
rm -rf /var/opt/gitlab/postgresql/data/
/opt/gitlab/embedded/bin/patronictl -c /var/opt/gitlab/patroni/patroni.yaml remove postgresql-ha
-
gitlab-ctl reconfigure
- This reconfigure will fail, it seems like the reconfigure is waiting for Patroni to start but never actually try's to start Patroni.
gitlab-ctl start patroni
These steps need to be run on all PostgreSQL nodes in the secondary site. The best time to run these steps is whilst adding the Patroni config on the secondary sites PostgreSQL nodes.
Steps to reproduce
This is encountered by following the Geo documentation and using Patroni to handle replication on the primary and secondary sites.
What is the expected correct behavior?
Whilst setting up Geo, once we add the new config to connect our secondary site to the primary the initial reconfigure should run the required steps to cleanly remove the secondaries old cluster and correctly add it to the new.
Environment details
During testing this was replicated using a 10k Reference Architecture for the primary site and a 3k Reference Architecture environment for the secondary. Both sites used Patroni and have 3 PostgreSQL nodes.