Skip to content

Docs: Avoid downtime due to deploy node upgrade

Problem

While performing a zero-downtime update in gitlab#199836 (closed), we encountered downtime.

From gitlab#199836 (comment 280323486):

For zero-downtime, the deploy node must not be serving requests when we upgrade it, so e.g. it should be removed from the load balancer before that, and added back in after pre-deployment migrations have run

Also you might choose a deploy node running Sidekiq, or a deploy node running geo-logcursor. If so, those services need to be stopped, or else the upgraded code running before the pre-deploy migrations have finished will likely produce errors. And if no other node in the cluster is running those services, then users will experience a reduction in functionality during the upgrade.

Partially resolves this issue #5046 (closed)

Side note about stopping Sidekiq

When stopping Sidekiq ungracefully, you risk corrupting data. Ideally you would instead do:

  1. Find the main Sidekiq PID
  2. Send it TSTP to stop it from picking up jobs
  3. Wait for it to finish all running jobs
  4. gitlab-ctl stop sidekiq

Here is the issue to incorporate this process into omnibus commands: #4918

Edited by 🤖 GitLab Bot 🤖

Merge request reports