Skip to content

pg-upgrade command could leave upgrades in a bad state

Problem Description

Restating the problem discovered during remediation from testing: gitlab-org/release/tasks#10168 (closed)

  1. The gitlab-ctl pg-upgrade command was run, but failed after hitting the default 600 second timeout

This left me wondering what did and did not completed. Since this should be an idempotent command, I simply ran it a second time.

# gitlab-ctl pg-upgrade
Checking for an omnibus managed postgresql: OK
Checking if postgresql['version'] is set: OK
Checking if we already upgraded: OK
The latest version 14.11 is already running, nothing to do

The above output isn't actually true. We never ran steps 7-12 on this documentation: https://docs.gitlab.com/omnibus/settings/database.html#upgrade-packaged-postgresql-server. The upgrade to the binaries occurred, but the data migration did not, the service was not running, and any database specific migrations/configurations were not pushed into place.

If an end user saw this, they could assume that steps continued running in the background leading them to believe that all is well. However, when running a gitlab-ctl status, they'll note that no only is postgres not running, but so are half of the other services that are stopped on purpose for the upgrade. If a user were to run gitlab-ctl start, Postgres version N+1 would be running against the old database content! This could be a dangerous situation depending on the change to Postgres.

We should revisit the UX for this command to ensure it is safe for users that rely on this.