Optimize ConcurrencyLimit::ResumeWorker performance (!206836) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

In gitlab-com/gl-infra/production#20567 (comment 2777839251), we noticed some workers with jobs indefinitely accumulating in the concurrency limit queue. This is mainly due to rate of incoming jobs > rate of resuming jobs, and we saw this backlogged queue on the busiest workers.

This MR attempts to optimize the ConcurrencyLimit::ResumeWorker by resuming as many jobs as it can in 1 execution, instead of 5000 jobs at once.

References

Pseudocode from gitlab-com/gl-infra/production#20567 (comment 2779201333)

How to set up and validate locally

Apply this diff

diff --git a/app/workers/chaos/sleep_worker.rb b/app/workers/chaos/sleep_worker.rb
index 43b851a9f264..41403388ae2e 100644
--- a/app/workers/chaos/sleep_worker.rb
+++ b/app/workers/chaos/sleep_worker.rb
@@ -9,6 +9,8 @@ class SleepWorker # rubocop:disable Scalability/IdempotentWorker
     sidekiq_options retry: 3
     include ChaosQueue
 
+    concurrency_limit -> { 10 }
+
     def perform(duration_s)
       Gitlab::Chaos.sleep(duration_s)
     end

Schedule a lot of jobs

while true
  Chaos::SleepWorker.perform_async(1)
end

On a separate console, keep checking the queue size

Gitlab::SidekiqMiddleware::ConcurrencyLimit::ConcurrencyLimitService.new("Chaos::SleepWorker").queue_size

Once there are enough jobs in the queue, stop the loop in step 2.
Check that queue size will be decreasing.

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited Sep 30, 2025 by Marco Gregorius

Optimize ConcurrencyLimit::ResumeWorker performance

What does this MR do and why?

References

How to set up and validate locally

MR acceptance checklist

Merge request reports