Mini RCA for alert: `BuildQueueWorker` Sidekiq worker (`main` stage) is not meeting its latency SLOs
Incident
This was not a major incident, but I think its worth studying closely in terms of the changes we are making gerrymandering the Sidekiq worker configurations.
https://gitlab.slack.com/archives/CD6HFD1L0/p1566437661415200
Take Aways
-
Even though there was a glut in BuildQueue worker times, this did not an impact on other jobs on other queues on shared worker infrastructure: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7642#note_206691575
-
The problem appears to be a glut of
BuildQueueWorker
jobs, all for a single project which all took about ~50second to complete: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7642#note_206705995
Edited by Andrew Newdigate