Production monitoring: metric anomaly detection for Duo and VSA dashboards
## Overview
Establish production monitoring focused on **metric anomaly detection** across the **Duo trends dashboard** and **Value Stream Analytics (VSA) dashboard**, to proactively surface unexpected changes in metric values before they impact customers.
## Problem
Currently there is no dedicated monitoring for metric anomalies in the Duo or VSA dashboards. Issues such as unexpected drops or spikes in metric values, broken calculations, or data pipeline disruptions are typically discovered reactively through support tickets or customer reports, increasing time-to-resolution and support overhead.
## Goals
- Detect and alert on metric anomalies (unexpected value changes, missing data, calculation errors) across both the Duo trends and VSA dashboards
- Enable the team to proactively identify and resolve metric regressions before they impact customers
- Reduce reactive support burden by catching anomalies early in production
## Implementation Notes
- Identify key metrics to monitor across both dashboards (e.g. Duo adoption rates, VSA stage durations, throughput metrics)
- Define anomaly detection thresholds or baselines for each metric (e.g. % deviation from rolling average)
- Integrate with existing observability tooling (e.g. Grafana, Kibana, or GitLab's internal monitoring stack)
- Alert on anomalies such as:
- Sudden drops or spikes in metric values
- Metrics returning null or zero unexpectedly
- Changes in denominator values affecting rate calculations
- Consider monitoring coverage for both SaaS and self-managed deployments
## Audience
- Engineering team (group::optimize)
- Support Engineers
- Customer Success Managers (CSMs)
issue