pg-upgrade of a Patroni cluster uses hardcoded timeout value for maximum startup time
Summary
If the database is large, using pg-upgrade to update PSQL versions leaves the replicas in a bad state because the startup time is hardcoded to be 120 seconds.
This MR seems related: omnibus-gitlab!6321 (merged) It refers to https://jihulab.com/gitlab-cn/gitlab/-/issues/686 which seems to be the same issue as we are seeing, but as far as I can tell the actual problem was not resolved since pg-upgrade.rb still has 120 seconds set.
Steps to reproduce
This was observed on a system that was created using the GitLab Environment toolkit. The database was 21G.
-
Follow the upgrade steps from the GitLab documentation (https://docs.gitlab.com/ee/administration/postgresql/replication_and_failover.html#upgrading-postgresql-major-version-in-a-patroni-cluster)
-
When running step 7 of the postgresql upgrade, the upgrade fails on both replicas with an error like: 'RuntimeError: PostgreSQL did not respond before service checks were exhausted'
The replica is then reverted back to the original version of PSQL, but no longer has a data directory. Patroni is started back up at this point, so before moving forward the user must 1) stop patroni and 2) create the data directory and set the correct permissions. (Failure to stop Patroni will result in odd behavior from the user perspective because it will not be possible to create a data directory without it being removed).
What is the current bug behavior?
Running gitlab-ctl pg-upgrade on a patroni replica leaves the replica in a bad state if the startup takes longer than 120 seconds.
What is the expected correct behavior?
Timeout for the startup of PSQL should be configurable to allow for a longer startup time (ie. if database is large enough to require longer to successfully start).
Possible fixes
There is a --timeout option for pg-upgrade, but the problematic 120 second value is hardcoded in the common_post_upgrade function in pg-upgrade.rb (https://gitlab.com/gitlab-org/omnibus-gitlab/-/blob/master/files/gitlab-ctl-commands/pg-upgrade.rb#L341)