Repmgr standby setup forces Postgres to create a socket with the primary Postgres node IP
Summary
Repmgr standby setup forces Postgres to create a socket with the primary Postgres node IP.
Steps to reproduce
This issue was found in the versions 12.1.4 and above. To reproduce you need to have:
- GitLab HA deployment with consul;
- Postgres Master (with repmgr) up and running;
- Secondary Postgres up and running;
- Try to setup the secondary Postgres as a standby node, with the following command:
gitlab-ctl repmgr standby setup ${master_ip} -w
What is the current bug behavior?
The presented output is as follows after running the gitlab-ctl repmgr standby setup ${master_ip} -w
Stopping the database
Removing the data
Cloning the data
Starting the database
uninitialized constant Timeout::TimeoutError
There is no repmgr command standby
Available repmgr commands:
master register -- Register the current node as a master node in the repmgr cluster
standby
clone MASTER -- Clone the data from node MASTER to set this node up as a standby server
register -- Register the node as a standby node in the cluster. Assumes clone has been done
setup MASTER -- Performs all steps necessary to setup the current node as a standby for MASTER
follow MASTER -- Follow the new master node MASTER
unregister --node=X -- Removes the node with id X from the cluster. Without --node removes the current node.
promote -- Promote the current node to be the master node
cluster show -- Displays the current membership status of the cluster
The timeout is due Postgres not be running anymore, this happened because standby setup brought the primary configuration that contains the primary node IP which makes Postgres try to create a socket using the primary's IP. This can be verified in the postgress logs where 10.10.1.30 is actually the primary IP (these logs come from the secondary Postgres node).
2019-09-25_15:38:41.56520 LOG: could not bind IPv4 address "10.10.1.30": Cannot assign requested address
2019-09-25_15:38:41.56522 HINT: Is another postmaster already running on port 5432? If not, wait a few seconds and retry.
2019-09-25_15:38:41.56522 WARNING: could not create listen socket for "10.10.1.30"
2019-09-25_15:38:41.56522 FATAL: could not create any TCP/IP sockets
2019-09-25_15:38:41.56523 LOG: database system is shut down
What is the expected correct behavior?
The expected behavior would be to have the secondary node being set up with Postgres restarted successfully after cloning the primary data over.
Details of package version
Provide the package version installation details
gitlab-ee-12.1.4-ee.0.el7.x86_64
OBS: This was also reproduced on the version 12.3.0 with package gitlab-ee-12.3.0-ee.0.el7.x86_64
Environment details
- Operating System:
Centos7
- Installation Target, remove incorrect values:
- VM: Virtual Box
- Installation Type, remove incorrect values:
- New Installation
- Is this a single or multiple node installation? Multinode
- Resources
- CPU:
1cpu
- Memory total:
2Gb
- CPU:
Configuration details
Provide the relevant sections of `/etc/gitlab/gitlab.rb`
roles ['postgres_role'] postgresql['port'] = 5432 postgresql['listen_address'] = '10.10.1.31' postgresql['hot_standby'] = 'on' postgresql['wal_level'] = 'replica' postgresql['shared_preload_libraries'] = 'repmgr_funcs' gitlab_rails['auto_migrate'] = false consul['services'] = %w(postgresql) postgresql['pgbouncer_user_password'] = 'xxxxx' postgresql['sql_user_password'] = 'xxxxxxx' postgresql['max_wal_senders'] = 4 postgresql['max_replication_slots'] = 4 postgresql['max_replication_slots'] = 4 postgresql['trust_auth_cidr_addresses'] = %w(127.0.0.1/32 10.10.1.30/32 10.10.1.31/32 10.10.1.32/32 10.10.1.33/32 10.10.1.38/32) repmgr['trust_auth_cidr_addresses'] = %w(127.0.0.1/32 10.10.1.30/32 10.10.1.31/32 10.10.1.32/32) consul['monitoring_service_discovery'] = true node_exporter['listen_address'] = '10.10.1.31:9100' postgres_exporter['listen_address'] = '10.10.1.31:9187' postgres_exporter['env']['DATA_SOURCE_NAME'] = "user=gitlab password='xxxxxx' host=10.10.1.31 database=postgres sslmode=disable" consul['configuration'] = { bind_addr: '10.10.1.31', retry_join: %w(10.10.1.34 10.10.1.35 10.10.1.36) } repmgr['master_on_initialization'] = false