Sidekiq SLO - authorized_projects
Summary
Sidekiq SLO - authorized_projects
Timeline
All times UTC.
2020-04-07
- 01:29 - Alert about Sidekiq Apdex SLO
- 01:30 - EOC acks and starts looking into it
- 01:36 - EOC determines it is
authorized_projects
again - 01:53 - Alert about Sidekiq Queue building up
- 01:54 - EOC acks
- 01:57 - Incident declared from Slack to capture this information of recurrence
- 01:58 - Alert clears | Sidekiq Queue building up
Details
This is the 2nd time (in the last 48 hours) that we are getting paged for Sidekiq Latency Apdex SLO alert and it is due to bursty RPS on authorized_projects
queue. It can be seen here: https://dashboards.gitlab.net/d/sidekiq-queue-detail/sidekiq-queue-detail?orgId=1&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main&var-queue=authorized_projects&from=1586116260000&to=1586137919999&theme=light
There is an epic: &176 (closed) that looks to address this problem so that the authorized_projects
does meet the SLOs. And I have commented on it.
When this issue happens we usually get paged twice in a row:
- For the Apdex
- For the Sidekiq queue build up
Source
Incident declared by aamarsanaa in Slack via /incident declare
command.
Resources
- If the Situation Zoom room was utilised, recording will be automatically uploaded to Incident room Google Drive folder (private)
Edited by Amarbayar Amarsanaa