2023-02-22: Sidekiq Low Urgency CPU Bound queue SLO violation
Current Status
Background jobs including MR and Pipeline creation was delayed due to a large increase in the volume of background jobs. This issue has now been mitigated and job processing has now returned to normal levels.
Note: We haven't seen a re-occurance of this issue again, however, if you continue to see issues with your MRs not being up to date, please try opening and closing your MR.
As of 1730 utc
there appears to be an on-going recovery of queue lengths, which will translate into increased front-end responsiveness and a return to normal processing of jobs. It will still take some more time, perhaps 30 minutes or more before things return entirely to normal.
Since 1617 utc
an alerting metric for the Low Urgency CPU-bound sidekiq queue has been in substantial violation of SLO.
More information will be added as we investigate the issue. For customers believed to be affected by this incident, please subscribe to this issue or monitor our status page for further updates.
📝 Summary for CMOC notice / Exec summary:
- Customer Impact: Background jobs including MR and Pipeline creation were delayed due to a large increase in the volume of background jobs.
- Service Impact: ServiceSidekiq
- Impact Duration:
16:17 UTC 2023-02-22 to 18:00 UTC 2023-02-22
- Root cause: RootCauseNaive-Traffic The root cause is explained in #8450 (comment 1304473696).
📚 References and helpful links
Recent Events (available internally only):
- Feature Flag Log - Chatops to toggle Feature Flags Documentation
- Infrastructure Configurations
- GCP Events (e.g. host failure)
Deployment Guidance
- Deployments Log | Gitlab.com Latest Updates
- Reach out to Release Managers for S1/S2 incidents to discuss Rollbacks and/or Hot Patching | Rollback Runbook | Hot Patch Runbook
Use the following links to create related issues to this incident if additional work needs to be completed after it is resolved:
- Corrective action ❙ Infradev
- Incident Review ❙ Infra investigation followup
- Confidential Support contact ❙ QA investigation
Note: In some cases we need to redact information from public view. We only do this in a limited number of documented cases. This might include the summary, timeline or any other bits of information, laid out in out handbook page. Any of this confidential data will be in a linked issue, only visible internally. By default, all information we can share, will be public, in accordance to our transparency value.