chore: use unified sli registry in dashboards (!7390) · Merge requests · GitLab.com / Runbooks

Bob Van Landuyt requested to merge bvl/use-unified-registry-in-dashboards into master May 22, 2024

chore: use unified sli registry in dashboards

In gitlab-com/gl-infra/scalability#2602 (closed) we've added a recording rule registry that includes all of the metrics used in SLIs with all of their significant labels.

This included a rename of a metric, which handily did not clash with the Thanos rules.

Now that we've switched Grafana to use Mimir as a default datasource, we can use this registry to resolve wich metrics to display.

chore: allow overriding gitlab-metrics-config for application SLIs

This allows overriding the recording rule registry when generating rules for application SLIs.

This means we can keep the same recordings in our Prometheus+Thanos implementation while we do switch the default registry that is used by our dashboards.

chore: configure selective registry for regional key metrics

This makes it so we can continue using the selective registry for globally evaluated regional recording rules.

chore: make prometheus rule group generator respect config

This makes the prometheus-service-group-generator respect the gitlab-metrics-config.

When doing this, we make sure that the GET-hybrid continues to use the NullRegistry, even when we switch the default over to use the unified registry.

This also configures the service-key-metric-rule-files that we use for Prometheus+Thanos over to using the selective registry so we don't change the recording rules there when switching the default.

fix: make combined metrics respect configured config

Before this, combined metrics would always rely on the default configuration.

In our Mimir setup, this meant that for these SLIs we would rely on high cardinality source metrics, while we had recording rules available.

For the GET-hybrid, this meant that for alerts we were generating a query that was using recording rules that weren't there.

Edited May 23, 2024 by Bob Van Landuyt

chore: use unified sli registry in dashboards

Merge request reports