Run ConcurrencyLimitSampler once per interval
What does this MR do and why?
Run ConcurrencyLimitSampler once per interval
Use ExclusiveLease to synchronize the global Sidekiq processes to run
ConcurrencyLimitSampler once per sampling interval. This reduces the
Redis calls to report queue_size and concurrent_worker_count.
Before this, all Sidekiq processes run the sampler after each interval. This is useless because each process reads from the same Redis and eventually reports the same metrics (or similar depending on the random interval from the sampler's).
Also reduces the sampling interval allowing us to run the sampler more often.
References
gitlab-com/gl-infra/data-access/durability/team#143
Screenshots or screen recordings
| Before | After |
|---|---|
How to set up and validate locally
master
- Apply the following
diff --git a/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb b/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
index ab2913b96879..e204717353a9 100644
--- a/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
+++ b/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
@@ -7,6 +7,7 @@ class ConcurrencyLimitSampler < BaseSampler
DEFAULT_SAMPLING_INTERVAL_SECONDS = 60
def sample
+ @logger.info("Running ConcurrencyLimitSampler pid #{Process.pid}")
worker_maps.workers.each do |w|
queue_size = concurrent_limit_service.queue_size(w.name)
report_queue_size(w.name, queue_size) if queue_size > 0
- Stop Sidekiq
gdk stop rails-background-jobs - Run sidekiq cluster with 4 processes
❯ bin/sidekiq-cluster default default default default
{"severity":"INFO","time":"2025-04-25T08:49:12.050Z","message":"Starting cluster with 4 processes"}
{"severity":"INFO","time":"2025-04-25T08:49:12.066Z","message":"Starting metrics server on port 3807"}
- You should see each process runs the sampler with the timestamp quite close to each other
this branch
Apply the following:
diff --git a/app/services/concerns/exclusive_lease_guard.rb b/app/services/concerns/exclusive_lease_guard.rb
index 1f557ef70bcd..1c8881891716 100644
--- a/app/services/concerns/exclusive_lease_guard.rb
+++ b/app/services/concerns/exclusive_lease_guard.rb
@@ -74,7 +74,7 @@ def log_lease_taken
end
def lease_taken_message
- "Cannot obtain an exclusive lease. There must be another instance already in execution."
+ "Cannot obtain an exclusive lease. There must be another instance already in execution. pid #{Process.pid}"
end
def lease_taken_log_level
diff --git a/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb b/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
index aaba725f4520..280fcb7f6572 100644
--- a/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
+++ b/lib/gitlab/metrics/samplers/concurrency_limit_sampler.rb
@@ -11,6 +11,7 @@ class ConcurrencyLimitSampler < BaseSampler
def sample
start = Time.current
try_obtain_lease do
+ @logger.info("Running ConcurrencyLimitSampler pid #{Process.pid}")
worker_maps.workers.each do |w|
queue_size = concurrent_limit_service.queue_size(w.name)
report_queue_size(w.name, queue_size) if queue_size > 0
@@ -20,6 +21,8 @@ def sample
end
duration = (Time.current - start).second
+ @logger.info("ConcurrencyLimitSampler took #{duration.second}s")
+ @logger.info("ConcurrencyLimitSampler sleeping for #{(DEFAULT_SAMPLING_INTERVAL_SECONDS - duration)}s") if duration < DEFAULT_SAMPLING_INTERVAL_SECONDS
# Sleep to force ExclusiveLease held by this process until it's time for the next sample.
Kernel.sleep(DEFAULT_SAMPLING_INTERVAL_SECONDS - duration) if duration < DEFAULT_SAMPLING_INTERVAL_SECONDS
end
- Restart Sidekiq
- See the logs below
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

