You need to sign in or sign up before continuing.

Document canonical Sidekiq routing rules for reference architectures

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

A customer ran into high Redis CPU utilization when upgrading from Gitlab v16.3 to v16.6. I suspect the increased number of queues (45) may be causing this CPU saturation. I count 676 queues being watched in the latest nightly. 2 years ago when @cmiskell wrote https://about.gitlab.com/blog/2021/09/02/specialized-sidekiq-configuration-lessons-from-gitlab-dot-com/ was written, we "only" had 440 queues.

In https://docs.gitlab.com/ee/administration/sidekiq/extra_sidekiq_processes.html, we document a naive setup:

sidekiq['queue_groups'] = ['*'] * 4

However, even for one node this is particularly hard on Redis because each of the 4 processes runs 50 threads and issues a BRPOP with over 650 queues!

In https://docs.gitlab.com/ee/administration/sidekiq/processing_specific_job_classes.html#migrating-from-queue-selectors-to-routing-rules, we provide a sample configuration for routing rules:

sidekiq['min_concurrency'] = 20
sidekiq['max_concurrency'] = 20

sidekiq['routing_rules'] = [
  ['urgency=high', 'high_urgency'],
  ['urgency=low', 'low_urgency'],
  ['urgency=throttled', 'throttled_urgency'],
  # Wildcard matching, route the rest to `default` queue
  ['*', 'default']
]

sidekiq['queue_selector'] = false
sidekiq['queue_groups'] = [
  'high_urgency',
  'low_urgency',
  'throttled_urgency',
  'default,mailers'
]

Whereas on GitLab.com, we've have these settings in https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/df779737f3af84a92f14a2ec1cd32a810315dd34/releases/gitlab/values/gprd.yaml.gotmpl#L959-969:

        - ["worker_name=AuthorizedProjectUpdate::UserRefreshFromReplicaWorker,AuthorizedProjectUpdate::UserRefreshWithLowUrgencyWorker", "quarantine"] # move this to the quarantine shard
        - ["worker_name=AuthorizedProjectsWorker", "urgent_authorized_projects"] # urgent-authorized-projects
        - ["resource_boundary=cpu&urgency=high", "urgent_cpu_bound"] # urgent-cpu-bound
        - ["resource_boundary=memory", "memory_bound"] # memory-bound
        - ["feature_category=global_search&urgency=throttled", "elasticsearch"] # elasticsearch
	- ["resource_boundary!=cpu&urgency=high", "urgent_other"] # urgent-other
        - ["resource_boundary=cpu&urgency=default,low", "low_urgency_cpu_bound"] # low-urgency-cpu-bound
        - ["feature_category=database&urgency=throttled", "database_throttled"] # database-throttled
        - ["feature_category=gitaly&urgency=throttled", "gitaly_throttled"] # gitaly-throttled
        - ["*", "default"] # catchall on k8s

Yet another customer in https://gitlab.com/gitlab-org/distribution/team-tasks/-/issues/1422#note_1690949576 has an even more surprising config: only two queues, one for search and one for another for everything else!

sidekiq['max_concurrency'] =  '25'
sidekiq['routing_rules'] = [
  ["feature_category=global_search", "global_search"],
  ['*', 'default'],
]

sidekiq['queue_groups'] = [
  'global_search',
  'global_search',
  'global_search',
  'global_search',
  'global_search'
]

I think we need to make it clearer the naive setup of all queues is no longer recommended, but we should also document a canonical example that will work for most installations.

The docs in https://docs.gitlab.com/ee/administration/sidekiq/processing_specific_job_classes.html#migrating-from-queue-selectors-to-routing-rules seem to be a good starting point, but I wonder if we should take some of the learnings from GitLab.com and tune this?

@cmiskell, @engwan, @qmnguyen0711, @grantyoung What do you think?

Edited Aug 09, 2025 by 🤖 GitLab Bot 🤖