Skip to content

pg-upgrade fails silently when running analyze_new_cluster.sh

Summary

Two Large Premium customers have reported attempting to upgrade to PG11, noticing very high CPU, load and lots of timeouts - symptoms of unvacuumed/unanalyzed database, especially after the major version upgrade - both ended up reverting the PG upgrade because of this.

What actually happens, is that we're trying to run the analyze_new_cluster.sh script that pg_upgrade generates here and catch any errors.

An initial test run of gitlab-ctl pg-upgrade (with -V 11 as I was testing on 12.8) doesn't show any errors, however the following query doesn't show any last_analyzed either:

gitlabhq_production=# SELECT count(*) FROM pg_stat_user_tables WHERE last_analyzed IS NOT NULL;
 count
-------
     0
(1 row)

Next attempt, revert the pg-upgrade, and modify /opt/gitlab/embedded/service/omnibus-ctl/lib/gitlab_ctl/util.rb as follows, to log the output:

@@ -11,6 +11,8 @@ module GitlabCtl
       def get_command_output(command, user = nil, timeout = nil)
         begin
           shell_out = run_command(command, live: false, user: user, timeout: timeout)
+          $stdout.print shell_out.stdout
+          $stdout.print shell_out.stderr
           shell_out.error!
         rescue Mixlib::ShellOut::ShellCommandFailed
           raise GitlabCtl::Errors::ExecutionError.new(

And ta-da:

vacuumdb: could not connect to database template1: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/tmp/.s.PGSQL.5432"?

Looks like the analyze_new_cluster.sh is trying to use some defaults, and it doesn't run, it fails silently and we don't catch this.

Manually running this (with the added -h) works:

su - gitlab-psql -c '"/opt/gitlab/embedded/postgresql/11/bin/vacuumdb" --all --analyze-in-stages -h /var/opt/gitlab/postgresql

And afterwards the same query above shows the tables got analyzed:

gitlabhq_production=# SELECT count(*) FROM pg_stat_user_tables WHERE last_analyze IS NOT NULL;
 count
-------
   317
(1 row)

Steps to reproduce

Run gitlab-ctl pg-upgrade.

Details of package version

To reproduce, I've used in order, with the same outcome, on Debian 9 (reverting the upgrade between versions):

  • 12.8.10-ee.0
  • upgrade to 12.9.5-ee.0
  • upgrade to 12.10.3-ee.0 with /etc/gitlab/disable-postgresql-upgrade (so that I can manually add the logging and test)

Environment details

Tested on a single Omnibus node in GCP, fresh install on Debian 9.

Configuration details

Default setup, nothing configured except external_url.

References