Add a howto listing the procedure to do a postgres switchover
Currently this documents how to do the switchover in staging based on a single test.
Observations and open questions....
-
Installing omnibus in production is causing chef to run gitlab-ctl reconfigure
despite the override
- This is still a mystery. But for the case documented here where we're doing rolling restarts and installing while postgres is shut down it's not actually a problem
- It may be a problem in the future once we change the prefix so we can avoid doing these rolling restarts (and just do
pg_ctl restart
)
-
Shutting down the primary doesn't cause a failover until the 60s expires. What command(s) can we run to trigger consul/repmgrd to do the failover immediately? -
Are all the role snippets that need to be updated afterwards really necessary? - The first is for the rails-db console and we definitely want it to point to a replica.
- The second is for the deploy node and is currently necessary but should be straightforward to make autodiscovered
I do not understand why the third bit is here. Isn't this the info that comes from consul?-
Having switched once switching back did not work. I had to run gitlab-ctl repmgr standby promote
to force postgres01 to become primary again -
Probably I should also run some commands to check the current consul state and check that pgbouncer is actually configured correctly. -
PostgreSQL_ExporterErrors alerts in prometheus after the failover?
-
The slot seems to be leftover on the old primary after setting it to follow :(
Edited by Gregory Stark