`promote-to-primary-node` silently fails reconfiguration due to postgresql service not listening
Summary
2 problems in gitlab-ctl promote-to-primary-node
script during Disaster Recovery https://docs.gitlab.com/ee/gitlab-geo/disaster-recovery.html#promoting-a-secondary-node:
- Silent failure during reconfiguration
- The script fails because the postgresql service is not listening where Rails is attempting to connect to DB.
What is the current bug behavior?
Silently fails during Reconfiguring...
step:
root@mike-demo-2a:~# gitlab-ctl promote-to-primary-node
---------------------------------------
WARNING: Make sure your primary is down and also be aware that
this command only works for setups with one secondary.
If you have more of them please follow documentation in https://docs.gitlab.com/ee/gitlab-geo/disaster-recovery.html
---------------------------------------
*** Are you sure? (N/y): y
Promoting the Postgres to primary...
Reconfiguring...
Running gitlab-rake geo:set_secondary_as_primary...
---------------------------------------
Note: Rsync everything in /var/opt/gitlab/gitlab-rails/uploads and /var/opt/gitlab/gitlab-rails/shared from your old node to the new one !!!
---------------------------------------
root@mike-demo-2a:~#
Running reconfigure on its own revealed an error connecting to Postgres:
Recipe: gitlab::database_migrations
* bash[migrate gitlab-rails database] action run
[execute] rake aborted!
PG::ConnectionBad: could not connect to server: Connection refused
Is the server running on host "10.156.0.2" and accepting
TCP/IP connections on port 5432?
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/db.rake:49:in `block (3 levels) in <top (required)>'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Tasks: TOP => gitlab:db:configure
(See full trace by running task with --trace)
================================================================================
Error executing action `run` on resource 'bash[migrate gitlab-rails database]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '1'
---- Begin output of "bash" "/tmp/chef-script20171219-30573-172xgwi" ----
STDOUT: rake aborted!
PG::ConnectionBad: could not connect to server: Connection refused
Is the server running on host "10.156.0.2" and accepting
TCP/IP connections on port 5432?
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/db.rake:49:in `block (3 levels) in <top (required)>'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Tasks: TOP => gitlab:db:configure
(See full trace by running task with --trace)
STDERR:
---- End output of "bash" "/tmp/chef-script20171219-30573-172xgwi" ----
Ran "bash" "/tmp/chef-script20171219-30573-172xgwi" returned 1
Resource Declaration:
---------------------
# In /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/database_migrations.rb
51: bash "migrate gitlab-rails database" do
52: code <<-EOH
53: set -e
54: log_file="#{node['gitlab']['gitlab-rails']['log_directory']}/gitlab-rails-db-migrate-$(date +%Y-%m-%d-%H-%M-%S).log"
55: umask 077
56: /opt/gitlab/bin/gitlab-rake gitlab:db:configure 2>& 1 | tee ${log_file}
57: STATUS=${PIPESTATUS[0]}
58: echo $STATUS > #{db_migrate_status_file}
59: exit $STATUS
60: EOH
61: environment env_variables unless env_variables.empty?
62: notifies :run, "execute[clear the gitlab-rails cache]", :immediately
63: dependent_services.each do |svc|
64: notifies :restart, svc, :immediately
65: end
66: not_if "(test -f #{db_migrate_status_file}) && (cat #{db_migrate_status_file} | grep -Fx 0)"
67: only_if { node['gitlab']['gitlab-rails']['auto_migrate'] }
68: end
Compiled Resource:
------------------
# Declared in /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/database_migrations.rb:51:in `from_file'
bash("migrate gitlab-rails database") do
action [:run]
retries 0
retry_delay 2
default_guard_interpreter :default
command "migrate gitlab-rails database"
backup 5
returns 0
code " set -e\n log_file=\"/var/log/gitlab/gitlab-rails/gitlab-rails-db-migrate-$(date +%Y-%m-%d-%H-%M-%S).log\"\n umask 077\n /opt/gitlab/bin/gitlab-rake gitlab:db:configure 2>& 1 | tee ${log_file}\n STATUS=${PIPESTATUS[0]}\n echo $STATUS > /var/opt/gitlab/gitlab-rails/upgrade-status/db-migrate-982d93bc4cd32a64fa48905cd9110d03-8d9ad6e\n exit $STATUS\n"
interpreter "bash"
declared_type :bash
cookbook_name "gitlab"
recipe_name "database_migrations"
not_if "(test -f /var/opt/gitlab/gitlab-rails/upgrade-status/db-migrate-982d93bc4cd32a64fa48905cd9110d03-8d9ad6e) && (cat /var/opt/gitlab/gitlab-rails/upgrade-status/db-migrate-982d93bc4cd32a64fa48905cd9110d03-8d9ad6e | grep -Fx 0)"
only_if { #code block }
end
Platform:
---------
x86_64-linux
Running handlers:
Running handlers complete
Chef Client failed. 2 resources updated in 40 seconds
root@mike-demo-2:~#
What is the expected correct behavior?
The script should output the reconfigure
output, and it should not error.
Possible fixes
- The script should output the reconfigure output
- The script should disable migrations before reconfiguring
- The script should restart postgres after reconfiguring
- The script should reenable migrations after reconfiguring, if they were previously enabled
If we cannot/don't want to disable migrations programmatically in the script, we would need to add instructions for the sysadmin to disable and then reenable migrations before and then after running the script. Similar to the Geo DB replication setup.