Sidekiq queue `authorized_projects` is being throttled
In gitlab-com/gl-infra/scalability#25 (comment 226089366), we're reviewing the current GitLab.com Sidekiq data to define proper SLOs for each Sidekiq job.
While performing this analysis, I noticed that the authorized_projects queue often takes much longer than any other queue on the realtime priority to be scheduled.
It seems that this is because of the combination of high traffic volume to this queue, and relative to all other queues on the realtime nodes, the lowest possible priority of 1.
These are the queues that get processed on the realtime nodes, with their priorities, from https://gitlab.com/gitlab-org/gitlab/blob/master/config/sidekiq_queues.yml
- [authorized_projects, 1]
- [email_receiver, 2]
- [gitlab_shell, 2]
- [merge, 5]
- [new_issue, 2]
- [new_merge_request, 2]
- [process_commit, 3]
- [reactive_caching, 1]
- [update_merge_requests, 3]
reactive_caching also has a priority of 1, but only received about 45% of the traffic that authorized_projects receives.
authorized_projects has a scheduling p95 in the region of 10 seconds, although in the past week this has spiked up to 2 minutes.
Questions
- Is
authorized_projectslatency sensitive? ie, if these jobs take a long time to run, does it impact on the expectations of users?
Proposal
Bump authorized_projects up to a priority 2 and add another realtime node to help with the peak volumes which are currently causing it to be throttled.
