Anomaly alerts MVC
As an MVC towards the broader anomaly detection and alerting issue (&119) we should provide alerting based on deviation from the weekly mean. This issue covers the backend piece of this work. The frontend pieces are covered separately as part of https://gitlab.com/gitlab-org/gitlab-ee/issues/5366.
- Define a recording rule for a weekly moving average of desired metrics. See this doc for more details on potential implementations.
- Define a recording rule for a 5 minute moving average of all nodes/pods for each metric.
- Create alert for each metric when the 5 minute moving average is X standard deviations beyond the weekly moving average
- Create alert for each metric when a node's 5 minute moving average is X standard deviations beyond the populations average.
In this case our definitions would be:
- Anomaly: two standard deviations away from the weekly moving average. 2sigma would then only fire, in theory, in the event we are 95% sure of an error. (Assuming normal distribution.)
- Outlier: a single node's metrics are two standard deviations away from the rest of the fleet.
We could provide documentation and examples for how to do this with an external Prometheus server, then support support adding these to the managed Prometheus.
How do users disable the pre-configured alert?
- Turn off alerts entirely
- Adjust the alertmanager config to remove that specific alert
Out of scope
In this issue we are not considering the ability for users to configure their own anomaly alerts on custom or pre-provided metrics. We will only be adding out-of-box anomaly alerts to out-of-box metrics.