[TEST]The Sidekiq service is not meeting its latency SLOs
Summary
Start time: 2020-06-04T15:32:52+00:00
Service: Sidekiq
Monitoring tool: PagerDuty
Hosts: unknown
Alert Details
description: The Sidekiq service is not meeting its latency SLOs - https://dashboards.gitlab.net/d/sidekiq-main/sidekiq-overview?orgId=1
severity: high
fingerprint: 00d49ef67934b9ada6afe196f5356c538e100bb3
Summary
This is a training exercise being conducted with the product and engineering teams that are developing incident and alert management features.
Timeline
All times UTC.
2020-06-04
- 14:38 - @mwasilewski-gitlab is triaging the issue looking at the GitLab Sidekiq service dashboard.
- 14:38 - @brentnewton pauses discussion to point out that we'd be adding timeline updates to the incident issue.
Incident Review
Summary
- Service(s) affected :
~Service::Foo
- Team attribution :
~team::Foo
- Minutes downtime or degradation :
Metrics
Customer Impact
- Who was impacted by this incident? (i.e. external customers, internal customers)
- What was the customer experience during the incident? (i.e. preventing them from doing X, incorrect display of Y, ...)
- How many customers were affected?
- If a precise customer impact number is unknown, what is the estimated potential impact?
Incident Response Analysis
- How was the event detected?
- How could detection time be improved?
- How did we reach the point where we knew how to mitigate the impact?
- How could time to mitigation be improved?
Post Incident Analysis
- How was the root cause diagnosed?
- How could time to diagnosis be improved?
- Do we have an existing backlog item that would've prevented or greatly reduced the impact of this incident?
- Was this incident triggered by a change (deployment of code or change to infrastructure. If yes, have you linked the issue which represents the change?)?
Timeline
- YYYY-MM-DD XX:YY UTC: action X taken
- YYYY-MM-DD XX:YY UTC: action Y taken
5 Whys
Lessons Learned
Corrective Actions
Guidelines
Resources
- If the Situation Zoom room was utilised, recording will be automatically uploaded to Incident room Google Drive folder (private)
Edited by AnthonySandoval