Incident Metrics: A Visual Guide (!111658) · Merge requests · GitLab.com / www-gitlab-com

Alana Bellucci requested to merge abellucci-master-patch-18384 into master Sep 19, 2022

Why is this change being made?

Incident Management Metrics: A Visual Guide - 1... (#13986 - closed) is part two in a three part series for Blog Post Series for GitLab Incident Management... (&1945).

Metrics to include:

First product impact: The first moment of severe impact to the product start time
Mean time to detection (MTTD): When the operator becomes aware of the problem. impact detected
Service Level Agreement (SLA): time frames in which you can expect the first response. SLA times are not considered as an expected time to resolution. response initiated
Severity
- severity1: Service is unavailable or completely unusable (30 Minutes)
- severity2: Service is highly degraded, there is no work around and there is a significant business impact (4 hours)
- severity3: Something is preventing normal service operation but there is a work around (8 hours)
- severity4: There are questions/ clarifications around features/ documentation that have minimal or no business impact (24 hours)
Mean time to mitigate (MTTM): When there is no longer severe product impact. The system may still be degraded in some way. impact mitigated
Mean time to recovery (MTTR): When the system has fully recovered and is operating normally. Note: Sometimes recovery and mitigation are the same, but sometimes they are different. MTTR is the same as the DORA metric Time to restore service: time an incident was open in a production environment over the given time period. end time
Mean time between incidents (MTBI): The time between the full recovery of the system and the first product degradation after the incident.
~~Service Level Objectives (SLO): target for the proper level of reliability~~
~~Service Level Indicators (SLI): a metric that tells you how your service is operating from the perspective of your users; i.e can a user load a page quickly enough.~~

Sources

Edited Nov 09, 2022 by Alana Bellucci

Incident Metrics: A Visual Guide

Why is this change being made?

Sources

Merge request reports