Regional SLO alerts

This change completes the set of MR to include regional monitoring for some services at GitLab.

We already have regional dashboards, for example https://dashboards.gitlab.net/d/git-regional/git-regional-detail and https://dashboards.gitlab.net/d/registry-regional/registry-regional-detail?orgId=1.

This adds monitoring for those services.

I've included some comments in the code for reviewers...

Dashboards

Each aggregation set now has Apdex and Error Rate SLO analysis dashboards.

Snapshots of these dashboards are here:

This is helpful for when, say, a regional alert fires and the SLO Analysis dashboard can be used to generate a snapshot for the alert: for example puma in us-east1-b https://dashboards.gitlab.net/dashboard/snapshot/gARUuT3uvEQsuzwdRsQ7oWLkYhYgL1t2?orgId=1&var-PROMETHEUS_DS=Global&var-environment=gprd&var-type=git&var-stage=main&var-region=us-east1-b&var-component=puma&var-proposed_slo=NaN

Edited by Andrew Newdigate

Merge request reports

Loading