Docs: Avoid downtime due to deploy node upgrade
Problem
While performing a zero-downtime update in gitlab#199836 (closed), we encountered downtime.
From gitlab#199836 (comment 280323486):
For zero-downtime, the deploy node must not be serving requests when we upgrade it, so e.g. it should be removed from the load balancer before that, and added back in after pre-deployment migrations have run
Also you might choose a deploy node running Sidekiq, or a deploy node running geo-logcursor
. If so, those services need to be stopped, or else the upgraded code running before the pre-deploy migrations have finished will likely produce errors. And if no other node in the cluster is running those services, then users will experience a reduction in functionality during the upgrade.
Partially resolves this issue #5046 (closed)
Side note about stopping Sidekiq
When stopping Sidekiq ungracefully, you risk corrupting data. Ideally you would instead do:
- Find the main Sidekiq PID
- Send it
TSTP
to stop it from picking up jobs - Wait for it to finish all running jobs
gitlab-ctl stop sidekiq
Here is the issue to incorporate this process into omnibus commands: #4918