Skip to content

Consul is unable to watch additional Postgres services and perform failover due to hardcodes

While building out a new environment where there's two HA Postgres setups for GitLab and Praefect respectively it was found that Consul is unable to watch the latter due to some hardcodes.

This only applies to the Watch side of Consul on PgBouncer nodes where the Watcher as well as it's script are hardcoded to a service named postgresql specifically:

/var/opt/gitlab/consul/scripts/failover_pgbouncer

63  masters = find_masters(healthy_agents, 'service:postgresql')

On this line the service being looked for by the script is hardcoded to postgresql. This should be configurable by a omnibus setting

/var/opt/gitlab/consul/config.d/watcher_postgresql.json

The generated file for the Consul Watcher is hardcoded to only follow a service that has the same name as the watcher, e.g. postgresql:

{
  "watches": [
    {
      "type": "service",
      "service": "postgresql",
      "args": [
        "/var/opt/gitlab/consul/scripts/failover_pgbouncer"
      ]
    }
  ]
}

The service name here should be overridable like we can do on the service side with consul['internal']['postgresql_service_name']

gitlab-ctl-commands-ee/lib/pgbouncer.rb

     @database = if attributes.key?('gitlab')
                    attributes['gitlab']['gitlab-rails']['db_database']
                  else
                    'gitlabhq_production'
                  end

[...]

    def database_paused?
      return false unless running?

      databases = show_databases

      # In `show databases` output, column 10 gives paused status of database
      # (1 for paused and 0 for unpaused)
      paused_status = databases.lines.find { |x| x.match(/#{@database}/) }.split('|')[10].strip

      paused_status == "1"
    end

    def resume_if_paused
      pgbouncer_command("RESUME #{@database}") if database_paused?
    end
Edited by Grant Young