Skip to content

Add a Geo PG replication error troubleshooting tip

Michael Kozono requested to merge mk-add-pg-troubleshooting-tip into master

This just happened to me while setting up Geo, and this seems like the most appropriate place to put the help.

root@mike-geo-instance-template-2:~# gitlab-ctl reconfigure; gitlab-ctl restart postgresql
Starting Chef Client, version 12.12.15
resolving cookbooks for run list: ["gitlab-ee"]

<snip>

Chef Client finished, 89/620 resources updated in 01 minutes 29 seconds
gitlab Reconfigured!
ok: run: postgresql: (pid 18840) 0s
root@mike-geo-instance-template-2:~# gitlab-ctl replicate-geo-database --slot-name=secondary_example --host=10.128.0.22

---------------------------------------------------------------
WARNING: Make sure this script is run from the secondary server
---------------------------------------------------------------

*** You are about to delete your local PostgreSQL database, and replicate the primary database. ***
*** The primary geo node is `10.128.0.22` ***

*** Are you sure you want to continue (replicate/no)? ***
Confirmation: replicate
* Executing GitLab backup task to prevent accidental data loss
* Stopping PostgreSQL and all GitLab services
timeout: run: geo-logcursor: (pid 18827) 201s, want down, got TERM
ok: down: geo-postgresql: 0s, normally up
ok: down: gitaly: 1s, normally up
ok: down: gitlab-monitor: 0s, normally up
ok: down: gitlab-workhorse: 0s, normally up
ok: down: logrotate: 1s, normally up
ok: down: nginx: 0s, normally up
ok: down: node-exporter: 1s, normally up
ok: down: postgres-exporter: 0s, normally up
ok: down: postgresql: 0s, normally up
ok: down: prometheus: 1s, normally up
ok: down: redis: 0s, normally up
ok: down: redis-exporter: 0s, normally up
ok: down: sidekiq: 0s, normally up
ok: down: unicorn: 1s, normally up

*** Initial replication failed! ***

Replication tool returned with a non zero exit status!

Troubleshooting tips:
  - replication should be run by root user
  - check your trust settings `md5_auth_cidr_addresses` in `gitlab.rb` on the primary node

Failed to execute: gitlab-ctl stop

root@mike-geo-instance-template-2:~# gitlab-rake gitlab:tcp_check[10.128.0.22,5432]
TCP connection from 10.138.0.4:49506 to 10.128.0.22:5432 succeeded
root@mike-geo-instance-template-2:~# gitlab-ctl restart postgresql
ok: run: postgresql: (pid 19533) 0s
root@mike-geo-instance-template-2:~# gitlab-ctl replicate-geo-database --slot-name=secondary_example --host=10.128.0.22

---------------------------------------------------------------
WARNING: Make sure this script is run from the secondary server
---------------------------------------------------------------

*** You are about to delete your local PostgreSQL database, and replicate the primary database. ***
*** The primary geo node is `10.128.0.22` ***

*** Are you sure you want to continue (replicate/no)? ***
Confirmation: replicate
* Executing GitLab backup task to prevent accidental data loss
* Stopping PostgreSQL and all GitLab services
Enter the password for gitlab_replicator@10.128.0.22: 
* Checking for replication slot secondary_example
* Creating replication slot secondary_example
* Backing up postgresql.conf
* Moving old data directory to '/var/opt/gitlab/postgresql/data.1518480132'
* Starting base backup as the replicator user (gitlab_replicator)
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
transaction log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
39984/39984 kB (100%), 1/1 tablespace                                         
transaction log end point: 0/20000F8
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: base backup completed
* Writing recovery.conf file with sslmode=verify-ca
* Restoring postgresql.conf
* Setting ownership permissions in PostgreSQL data directory
* Starting PostgreSQL and all GitLab services
root@mike-geo-instance-template-2:~#
Edited by Michael Kozono

Merge request reports