Skip to content

Incident Metrics: A Visual Guide

Alana Bellucci requested to merge abellucci-master-patch-18384 into master

Why is this change being made?

Incident Management Metrics: A Visual Guide - 1... (#13986 - closed) is part two in a three part series for Blog Post Series for GitLab Incident Management... (&1945).

Metrics to include:

  • First product impact: The first moment of severe impact to the product start time
  • Mean time to detection (MTTD): When the operator becomes aware of the problem. impact detected
  • Service Level Agreement (SLA): time frames in which you can expect the first response. SLA times are not considered as an expected time to resolution. response initiated
  • Severity
    • severity1: Service is unavailable or completely unusable (30 Minutes)
    • severity2: Service is highly degraded, there is no work around and there is a significant business impact (4 hours)
    • severity3: Something is preventing normal service operation but there is a work around (8 hours)
    • severity4: There are questions/ clarifications around features/ documentation that have minimal or no business impact (24 hours)
  • Mean time to mitigate (MTTM): When there is no longer severe product impact. The system may still be degraded in some way. impact mitigated
  • Mean time to recovery (MTTR): When the system has fully recovered and is operating normally. Note: Sometimes recovery and mitigation are the same, but sometimes they are different. MTTR is the same as the DORA metric Time to restore service: time an incident was open in a production environment over the given time period. end time
  • Mean time between incidents (MTBI): The time between the full recovery of the system and the first product degradation after the incident.
  • Service Level Objectives (SLO): target for the proper level of reliability
  • Service Level Indicators (SLI): a metric that tells you how your service is operating from the perspective of your users; i.e can a user load a page quickly enough.

Sources

Edited by Alana Bellucci

Merge request reports