Skip to content

database major upgrade - make promote_database in pg-upgrade.rb idempotent

Summary

Observed on a customer's environment, and during verification of the fix for #7841 (closed)

For pg-upgrade to work on a Geo secondary Rails database server, it promotes the database from being a replica to read/write

pg_ctl -D #{@db_worker.data_dir} promote"

This is a one-way trip. There's no pg_ctl command to reverse this.

If a Geo environment is part way through a PostgreSQL upgrade, the primary site will already be upgraded. So, it's not possible to re-establish the back-level secondary as a replica of the primary, since different major releases of PostgreSQL cannot be replicas of each other.

So, the second time through the code fails and the upgrade is reverted. See more output.

pg_ctl: cannot promote server; server is not in standby mode

Workaround

  1. Back up the Omnibus code:

    cd /opt/gitlab/embedded/service/omnibus-ctl
    cp -a pg-upgrade.rb pg-upgrade.rb_backup
  2. Edit pg-upgrade.rb and remark out these lines in the function promote_database, located at line 435 in 15.11.12

      #@db_worker.run_pg_command(
      #  "#{base_path}/embedded/bin/pg_ctl -D #{@db_worker.data_dir} promote"
      #)
  3. Re-run gitlab-ctl pg-upgrade

  4. Roll back the code change

    cd /opt/gitlab/embedded/service/omnibus-ctl
    mv pg-upgrade.rb_backup pg-upgrade.rb

Steps to reproduce

  1. Run pg-upgrade on a Geo secondary.
  2. Have it fail at any point between promoting the database and actually upgrading.
  3. Try to repeat the upgrade.

What is the current bug behavior?

If pg-upgrade fails in a Geo secondary, it can leave the system in a state that then cannot be upgraded, since the Rails database is promoted and this code isn't idempotent.

What is the expected correct behavior?

Have the promote_database code either only run if the database is a replica, or trap the error about the database already being promoted, and return success.

  @db_worker.run_pg_command(
    "#{base_path}/embedded/bin/pg_ctl -D #{@db_worker.data_dir} promote"
  )

Caution: this isn't the only use case for this code, eg: gitlab#300761 (closed)

Relevant logs

Relevant logs
Checking if PostgreSQL bin files are symlinked to the expected location: OK
Starting the database
Waiting 30 seconds to ensure tasks complete before PostgreSQL upgrade.
See https://docs.gitlab.com/omnibus/settings/database.html#upgrade-packaged-postgresql-server for details
If you do not want to upgrade the PostgreSQL server at this time, enter Ctrl-C and see the documentation for details

Please hit Ctrl-C now if you want to cancel the operation. ..............................Detected a Geo secondary node Upgrading the postgresql database Promoting the database STDOUT: STDERR: pg_ctl: cannot promote server; server is not in standby mode == Fatal error == There was an error promoting the database from standby, please check the logs and output. == Reverting ==

Details of package version

Omnibus 15.11.12 / 15.11.13 are currently the main version affected as the upgrade has to be done before upgrading to 16.0. For earlier 15.x releases pg-upgrade fails for another reason

Environment details

  • Omnibus Geo

Configuration details

Provide the relevant sections of `/etc/gitlab/gitlab.rb`

Edited by Ben Prescott_