Fix SLA dashboard to utilise Thanos-agggregated service-metrics

In https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/9689, we're addressing the situation in which service-level metrics that are split across multiple Prometheus servers cause SLA metrics to under- and over- perform (mostly under!)

Since fixes the prometheus queries used to display the SLA, taking into account the new data, but also providing a backwards compatible view to the old (less correct) data.

@jivanvl I've also taken the opportunity to DRY-up the GitLab Dashboards version of the page. I've used Jsonnet to repeat the panel elements and use the service catalog as a source of truth for our key services.

Snapshot https://dashboards.gitlab.net/dashboard/snapshot/bbLa2uOziz1mgWjBSqKT0AaUFlfvYMtn?orgId=1&from=now-30d&to=now

image

Edited by Andrew Newdigate

Merge request reports

Loading