Run ConcurrencyLimitSampler once per interval

What does this MR do and why?

Run ConcurrencyLimitSampler once per interval

Use ExclusiveLease to synchronize the global Sidekiq processes to run ConcurrencyLimitSampler once per sampling interval. This reduces the Redis calls to report queue_size and concurrent_worker_count.

Before this, all Sidekiq processes run the sampler after each interval. This is useless because each process reads from the same Redis and eventually reports the same metrics (or similar depending on the random interval from the sampler's).

Also reduces the sampling interval allowing us to run the sampler more often.

References

gitlab-com/gl-infra/data-access/durability/team#143

Screenshots or screen recordings

Before After

How to set up and validate locally

master

  • Apply the following
diff --git a/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb b/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
index ab2913b96879..e204717353a9 100644
--- a/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
+++ b/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
@@ -7,6 +7,7 @@ class ConcurrencyLimitSampler < BaseSampler
         DEFAULT_SAMPLING_INTERVAL_SECONDS = 60
 
         def sample
+          @logger.info("Running ConcurrencyLimitSampler pid #{Process.pid}")
           worker_maps.workers.each do |w|
             queue_size = concurrent_limit_service.queue_size(w.name)
             report_queue_size(w.name, queue_size) if queue_size > 0
  • Stop Sidekiq gdk stop rails-background-jobs
  • Run sidekiq cluster with 4 processes
❯ bin/sidekiq-cluster default default default default
{"severity":"INFO","time":"2025-04-25T08:49:12.050Z","message":"Starting cluster with 4 processes"}
{"severity":"INFO","time":"2025-04-25T08:49:12.066Z","message":"Starting metrics server on port 3807"}
  • You should see each process runs the sampler with the timestamp quite close to each other

image

this branch

Apply the following:

diff --git a/app/services/concerns/exclusive_lease_guard.rb b/app/services/concerns/exclusive_lease_guard.rb
index 1f557ef70bcd..1c8881891716 100644
--- a/app/services/concerns/exclusive_lease_guard.rb
+++ b/app/services/concerns/exclusive_lease_guard.rb
@@ -74,7 +74,7 @@ def log_lease_taken
   end
 
   def lease_taken_message
-    "Cannot obtain an exclusive lease. There must be another instance already in execution."
+    "Cannot obtain an exclusive lease. There must be another instance already in execution. pid #{Process.pid}"
   end
 
   def lease_taken_log_level
diff --git a/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb b/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
index aaba725f4520..280fcb7f6572 100644
--- a/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
+++ b/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
@@ -11,6 +11,7 @@ class ConcurrencyLimitSampler < BaseSampler
         def sample
           start = Time.current
           try_obtain_lease do
+            @logger.info("Running ConcurrencyLimitSampler pid #{Process.pid}")
             worker_maps.workers.each do |w|
               queue_size = concurrent_limit_service.queue_size(w.name)
               report_queue_size(w.name, queue_size) if queue_size > 0
@@ -20,6 +21,8 @@ def sample
             end
 
             duration = (Time.current - start).second
+            @logger.info("ConcurrencyLimitSampler took #{duration.second}s")
+            @logger.info("ConcurrencyLimitSampler sleeping for #{(DEFAULT_SAMPLING_INTERVAL_SECONDS - duration)}s") if duration < DEFAULT_SAMPLING_INTERVAL_SECONDS
             # Sleep to force ExclusiveLease held by this process until it's time for the next sample.
             Kernel.sleep(DEFAULT_SAMPLING_INTERVAL_SECONDS - duration) if duration < DEFAULT_SAMPLING_INTERVAL_SECONDS
           end
  • Restart Sidekiq
  • See the logs below

image

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Marco Gregorius

Merge request reports

Loading