Skip to content

postgres replication failing from the gcp secondary

Summary

We are initiating geo replication from postgres-01.db.gprd.gitlab.com from postgres-04.db.prd.gitlab.com, a secondary of the primary gitlab.com database.

We have attempted to initiate replication a few times and all cases it has failed. The most recent attempt saw us get up to around 30% which was hours after the initiation of the replication process.

Troubleshooting so far

  • I have tried to pull logs from the ipsec tunnel, virtual network gateway on azure and the vpn connection on Google. There is a very limited view on what is going on here, it looks fine on both side.

The last two attempts: https://console.cloud.google.com/interconnect/vpn/tunnels/details/us-east1/gcp-azure-gprd?project=gitlab-production&tab=monitoring&duration=P1D

Screen_Shot_2018-02-02_at_11.17.50_AM

*** The primary geo node is `10.66.1.104` ***

*** Are you sure you want to continue (replicate/no)? ***
Confirmation: replicate
* Stopping PostgreSQL and all GitLab services
Enter the password for gitlab_repmgr@10.66.1.104:
* Checking for replication slot secondary_gprd
* Backing up postgresql.conf
* Moving old data directory to '/var/opt/gitlab/postgresql/data.1517535046'
* Starting base backup as the replicator user (gitlab_repmgr)
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
transaction log start point: 42ED/642D2858 on timeline 4
pg_basebackup: could not read COPY data: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
transaction log start point: 42ED/642D2858 on timeline 4
pg_basebackup: could not read COPY data: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
*** Initial replication failed! ***

Replication tool returned with a non zero exit status!

Troubleshooting tips:
  - replication should be run by root user
  - check your trust settings `md5_auth_cidr_addresses` in `gitlab.rb` on the primary node

Failed to execute: PGPASSFILE=/var/opt/gitlab/postgresql/.pgpass /opt/gitlab/embedded/bin/pg_basebackup -h 10.66.1.104 -p 5432 -D /var/opt/gitlab/postgresql/data -U gitlab_repmgr -v -x -P
Edited by John Jarvis
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information