Skip to content

feat: group alerts by service/type

Steve Xuereb requested to merge feat/groups-alerts-by-service into master

What

Group alerts by service

Group alerts by the type label using Alertmanager grouping to only send one alert per service.

Alertmanager group example

alertmanager-grouping.excalidraw

Update template for slack notification

Update alert template for slack notification to specify which service is firing and the list of alerts that fired. https://prometheus.io/docs/alerting/latest/notifications/ is a good reference for alertmanager templating

Before After
Slack
Screenshot_2022-06-09_at_10.43.52
PagerDuty Screenshot_2022-06-13_at_09.10.40
Slack
image
PagerDuty Screenshot_2022-06-13_at_09.06.59

Update silence button

Update the silence button to silence all the firing alerts for that type instead of a single alert, for example

Screenshot_2022-06-10_at_10.03.05

Why

As seen in gitlab-com/gl-infra&746 (closed) we have multiple alerts paging the SRE on-call in a few seconds for the same service. Instead of sending multiple pages which can be stressful, distracting and unclear which one to take action on, send only 1 per type

Reference: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15765

Testing

Edited by Steve Xuereb

Merge request reports