FY22-Q2 All Ops Sub-dept teams perform RCAs for their incidents
Description
In FY22-Q2 we have established a KR for all Ops Sub-dept teams to perform Root Cause Analysis (RCA) for their incidents. This is to help ensure we have a healthy, blameless feedback loop between development and production and are continually improving based on learnings from incidents.
More context on GitLab's Incident Review Process can be found here.
Link to Ally.io KR: https://app.ally.io/objectives/1335501?time_period_id=135090
Process For Ops DRIs to follow
If you are assigned as a DRI for an Incident issue in the table below please complete the following steps:
- Review and update Timeline on the Incident issue
- Check if the duration of customer impact can be determined from the timeline. If not update the timeline with this information at minimum:
- When the incident first began to impact customers (or best estimate)
- When the customer impact was fully mitigated (or best estimate)
- If this information isn't in the timeline you may find it in issue comments or the incident slack channel.
- Check if the duration of customer impact can be determined from the timeline. If not update the timeline with this information at minimum:
- Review and update Current Status
- Check if customer impact is summarized in the Current Status. If it is not update it with a brief description of the impact this incident had on customers.
- Review Root Cause Analysis
- Review and complete the Root Cause Analysis section with members of the your group. This can be done in the standing weekly Incident Review Synchronous Meeting Sessions, in a team specific sync meeting, or asynchronously in the issue. Solicit input from the appropriate stable counterparts (e.g. PM, Quality, UX) in addition to Development engineers.
- Review Corrective Actions
- Ensure that Corrective Actions identified through the RCA process are identified and linked to the incident issue. Look for opportunities to catch problems pre-production in development, testing, staging based on the learnings.
- Mark the issue as reviewed and completed in the table below. We will use this information when scoring this OKR.
Progress
Retro
Stop
Start
Try
Edited by Allison Browne