Consul is unable to watch additional Postgres services and perform failover due to hardcodes
While building out a new environment where there's two HA Postgres setups for GitLab and Praefect respectively it was found that Consul is unable to watch the latter due to some hardcodes.
This only applies to the Watch side of Consul on PgBouncer nodes where the Watcher as well as it's script are hardcoded to a service named postgresql
specifically:
/var/opt/gitlab/consul/scripts/failover_pgbouncer
63 masters = find_masters(healthy_agents, 'service:postgresql')
On this line the service being looked for by the script is hardcoded to postgresql
. This should be configurable by a omnibus setting
/var/opt/gitlab/consul/config.d/watcher_postgresql.json
The generated file for the Consul Watcher is hardcoded to only follow a service that has the same name as the watcher, e.g. postgresql
:
{
"watches": [
{
"type": "service",
"service": "postgresql",
"args": [
"/var/opt/gitlab/consul/scripts/failover_pgbouncer"
]
}
]
}
The service name here should be overridable like we can do on the service side with consul['internal']['postgresql_service_name']
gitlab-ctl-commands-ee/lib/pgbouncer.rb
@database = if attributes.key?('gitlab')
attributes['gitlab']['gitlab-rails']['db_database']
else
'gitlabhq_production'
end
[...]
def database_paused?
return false unless running?
databases = show_databases
# In `show databases` output, column 10 gives paused status of database
# (1 for paused and 0 for unpaused)
paused_status = databases.lines.find { |x| x.match(/#{@database}/) }.split('|')[10].strip
paused_status == "1"
end
def resume_if_paused
pgbouncer_command("RESUME #{@database}") if database_paused?
end
Edited by Grant Young