Remove Sidekiq shutdown delay in ConcurrencyLimitSampler

What does this MR do and why?

Previously it took Sidekiq at least 30 seconds to shut down because the ConcurrencyLimitSampler thread slept an additional 30 seconds after receiving a SIGTERM.

As explained in omnibus-gitlab#9136 (comment 2585352651), this was happening:

  1. ConcurrencyLimitSampler#sample ran and slept for 30 seconds.
  2. gitlab-ctl reconfigure attempted to shutdown sidekiq via SIGTERM.
  3. The SIGTERM would kick the thread out of the sleep, but since while exclusive_lease.same_uuid? were true, the sampler would report metrics and then sleep for additional 30 seconds.

The fix is simple: we should check that the thread is still running before looping again to report metrics.

Relates to omnibus-gitlab#9136 (closed)

References

Screenshots or screen recordings

Before After

How to set up and validate locally

See omnibus-gitlab#9136 (closed).

The easiest way to reproduce this is:

  1. Install a Linux package on a system.
  2. Make a change to the /etc/gitlab/gitlab.rb that causes Sidekiq to restart (gitlab_rails['gitlab_default_theme'] set to a value between 1 and 11 will work).
  3. Run gitlab-ctl reconfigure.

This should fail with timeouts quite frequently. Then:

  1. Patch /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb.
  2. gitlab-ctl restart sidekiq
  3. Repeat steps 1-3.

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Stan Hu

Merge request reports

Loading