Define sidekiq queues behavior
We need to switch from reactive to proactive regarding sidekiq queues.
Autoscaling of queues is far away due to our provider limitations, and all our monitoring is based on reverse engineering because sidekiq has always been a black box from the production perspective.
To get ahead of this situation we need the following categorization provided by whoever has more context on the guts of the application so we can properly plan for capacity for this piece of infrastructure:
- List of queues.
- CPU or IO bound, for the latter also Network or Disk bound.
- User facing or real low priority background processing.
This is because we will need to isolate and distribute queues reasonably well after we do https://gitlab.com/gitlab-com/infrastructure/issues/1945, because issues like https://gitlab.com/gitlab-com/infrastructure/issues/1973 lead to changes like:
"queue_groups": [
"process_commit",
- "project_cache",
- "build",
- "pipeline",
"system_hook_push",
"update_merge_requests",
- "mailers",
"project_service",
- "project_mirror"
+ "post_receive",
+ "project_web_hook"
]
Which is not really solving the problem, but shifting one queue problem onto another, perpetuating the toil.
@stanhu who could provide this data?
cc/ @ernstvn