Start using Woodhouse for incident declaration
-
Merge remaining Woodhouse MRs -
Compare IMA issue template to Woodhouse's, make sure they match. -
Documentation cutover MRs are approved -
Configure the already-deployed Woodhouse with real integrations for GitLab and Pagerduty -
/woodhouse incident declarecan now be used, and/incident declarewill keep working.
-
-
Raise a test incident with /woodhouse incident declare, to get confidence in woodhouse before shadowing the/incidentslash command. -
Configure the /incidentslash command in Woodhouse as per https://gitlab.com/gitlab-com/gl-infra/woodhouse#installing-slack-app- Invocations of this will now be sent to Woodhouse, not IMA
-
Configure a production project issue webhook as documented: https://gitlab.com/gitlab-com/gl-infra/woodhouse#gitlab-webhook-integration - IMA and Woodhouse will now each report incident issue events, which is noisy
-
Disable the IMA's production GitLab webhook by appending "-DELETEME-TO-ENABLE" to the secret token. - Now only woodhouse handles gitlab webhooks
-
Test woodhouse's real integrations -
Slack the EOC, IMOC, and CMOC, checking if this is a good time for them to get paged. -
In #production:/woodhouse incident declare -
In the modal, tick all pager boxes -
We should see an incident issue, slack channel, and all on-calls should be paged. -
Close, the reopen the incident issue. Woodhouse should post in slack about the reopen.
-
-
Merge documentation cutover MRs -
Configure periodic archival of old incident slack channels: https://gitlab.com/gitlab-com/gl-infra/woodhouse#slack-archive-incident-channels-subcommand -
Write up deprecation schedule for classic IMA (in another issue, link here) - Remove now-unused Pagerduty webhooks
- Remove GitLab webhooks
- Turn down the IMA application
- Write issues to replace remaining IMA functionality - like the
@sre-oncallschedule populator cronjob.
Edited by Craig Furman