Commit be0c9435 authored by Devin Sylva's avatar Devin Sylva Committed by Cameron McFarland

Added oncall checklist

parent 13186ee4
......@@ -7,6 +7,26 @@ By performing these tasks we will keep the [broken window
effect](https://en.wikipedia.org/wiki/Broken_windows_theory) under control, preventing future pain
and mess.
## Going on call
Here is a suggested checklist of things to do at the start of an on-call shift:
- *Change Slack Icon*: Click name. Click `Set status`. Click grey smile face. Type `:pagerduty:`. Set `Clear after` to end of on-call shift. Click `Save`
- *Add On-Call Feed*: PM yourself in slack `/feed add https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues.atom?feed_token=(TOKEN)&label_name%5B%5D=oncall&scope=all&state=opened&utf8=%E2%9C%93`
- *Add Production Feed*: PM yourself in slack `/feed add https://gitlab.com/gitlab-com/gl-infra/production/issues.atom?feed_token=(TOKEN)&label_name%5B%5D=incident&state=opened`
- *Join alert channels*: If not already a member, `/join` `#alerts`, `#alerts-general`, `#alerts-prod-abuse`, `#alerts-ops`
- *Turn on slack channel notifications*: Open `#production` Notification Preferences (and optionally #infra-lounge). Set Desktop and Mobile to `All new messages`
- *Turn on slack alert notifications*: Open `#alerts` and `#alerts-general` Notification Preferences. Set Desktop only to `All new messages`
- At the start of each on-call day, read all S1 incidents at: https://gitlab.com/gitlab-com/gl-infra/production/issues?scope=all&utf8=✓&state=opened&label_name%5B%5D=incident&label_name%5B%5D=S1
At the end of a shift:
- *Remove feeds*: PM yourself in slack `/feed list`, then `/feed remove (number)` for the production and on-call feeds
- *Turn off slack channel notifications*: Open `#production`, `#alerts`, `#alerts-general` Notification Preferences and return alerts to the desired values.
- *Leave noisy alert channels*: `/leave` alert channels (It's good to stay in `#alerts` and `#alerts-general`)
- Comment on any open S1 incidents at: https://gitlab.com/gitlab-com/gl-infra/production/issues?scope=all&utf8=✓&state=opened&label_name%5B%5D=incident&label_name%5B%5D=S1
- At the end of each on-call day, post a quick update in slack so the next person is aware of anything ongoing, any false alerts, or anything that needs to be handed over.
## Things to keep an eye on
### On-call issues
......@@ -16,6 +36,10 @@ happening lately. Also, keep an eye on the [#production][slack-production] and
[#incident-management][slack-incident-management] channels for discussion around any on-going
issues.
### Useful Dashboard to keep open
- [GitLab Triage](https://dashboards.gitlab.net/d/RZmbBr7mk/gitlab-triage?orgId=1&refresh=30s)
### Alerts
Start by checking how many alerts are in flight right now
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment