Error running reconfigure on standby instances
Coming from https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10175
In the postgresql::enable. The pg_helper.is_slave?
method bases its results on the output of SELECT pg_is_in_recovery()
. If that command fails, is_slave?
returns false, and reconfigure
tries to create the gitlab
user.
In some instances, the database might not be ready (i.e. a standby initializing), and a psql command will exit non-zero, with a database not ready
issue. reconfigure
will read this as the database not being a standby instance, and attempt to create the user, which will also fail.
I think we have a few options
Leave is_slave?
as is and
- Use
pg_isready
to wait for the system to come online. This could be a while, and in this particular case we could wait a while to determine we don't need to do anything - Use
pg_isready
to skip steps that require a ready database, possibly warning the user that we are doing so. Runningreconfigure
later when the database is ready should do the right thing
I think the latter option is my preferred.
We could check for a recovery.conf
file in data_dir
as part of is_slave?
. But this would only work temporarily as recovery.conf is going away in PostgreSQL 12.
We may also consider making psql_cmd
return more than boolean, or add another method for executing psql commands so we can differentiate between exit codes.