Docs: Avoid downtime due to deploy node upgrade (!3902) · Merge requests · GitLab.org / omnibus-gitlab

Problem

While performing a zero-downtime update in gitlab#199836 (closed), we encountered downtime.

From gitlab#199836 (comment 280323486):

For zero-downtime, the deploy node must not be serving requests when we upgrade it, so e.g. it should be removed from the load balancer before that, and added back in after pre-deployment migrations have run

Also you might choose a deploy node running Sidekiq, or a deploy node running geo-logcursor. If so, those services need to be stopped, or else the upgraded code running before the pre-deploy migrations have finished will likely produce errors. And if no other node in the cluster is running those services, then users will experience a reduction in functionality during the upgrade.

Partially resolves this issue #5046 (closed)

Side note about stopping Sidekiq

When stopping Sidekiq ungracefully, you risk corrupting data. Ideally you would instead do:

Find the main Sidekiq PID
Send it TSTP to stop it from picking up jobs
Wait for it to finish all running jobs
gitlab-ctl stop sidekiq

Here is the issue to incorporate this process into omnibus commands: #4918

Edited Nov 04, 2021 by 🤖 GitLab Bot 🤖

Docs: Avoid downtime due to deploy node upgrade

Problem

Side note about stopping Sidekiq

Merge request reports