Skip to content

Add a howto listing the procedure to do a postgres switchover

Gregory Stark requested to merge howto-postgres-switchover into master

Currently this documents how to do the switchover in staging based on a single test.

Observations and open questions....

  1. Installing omnibus in production is causing chef to run gitlab-ctl reconfigure despite the override
  • This is still a mystery. But for the case documented here where we're doing rolling restarts and installing while postgres is shut down it's not actually a problem
  • It may be a problem in the future once we change the prefix so we can avoid doing these rolling restarts (and just do pg_ctl restart)
  1. Shutting down the primary doesn't cause a failover until the 60s expires. What command(s) can we run to trigger consul/repmgrd to do the failover immediately?
  2. Are all the role snippets that need to be updated afterwards really necessary?
  3. The first is for the rails-db console and we definitely want it to point to a replica.
  4. The second is for the deploy node and is currently necessary but should be straightforward to make autodiscovered
  5. I do not understand why the third bit is here. Isn't this the info that comes from consul?
  6. Having switched once switching back did not work. I had to run gitlab-ctl repmgr standby promote to force postgres01 to become primary again
  7. Probably I should also run some commands to check the current consul state and check that pgbouncer is actually configured correctly.
  8. PostgreSQL_ExporterErrors alerts in prometheus after the failover?
  1. The slot seems to be leftover on the old primary after setting it to follow :(
Edited by Gregory Stark

Merge request reports