Skip to content

fix: apply regional and service filter for regionalServiceSLIs

Bob Van Landuyt requested to merge bvl/fix-regional-service-aggregation into master

fix: apply regional and service filter for regionalServiceSLIs

This applies both the regional and service filters for regionalServiceSLIs.

For Mimir recording rules this is be a no-op, as we're recording regionalServiceSLIs from the the regionalComponentSLIs. The regionalComponentSLIs are recorded from the source (SLI aggregations). The regional filter there is applied using the sli.regional predicate in jsonnet, so we only generate the recording rules for the SLI that we're interested in. This means that adding the regional filter here wouldn't do anything, we didn't have any other recording rules for regionalComponentSLIs.

In the Thanos environment, we record the regionalComponentSLIs in thanos from the promSourceComponent aggregation. This works because the Prometheus instance has a static label for the region it is deployed in. We did exactly the same for the regionalServiceSLIs, which does not have a component aggregation label. As a result the regionalServiceSLIs would be a sum of all regionalComponentSLIs. This is counter intuitive: this aggregation should only include the SLIs that we've marked to be included into the service aggregation.

The regionalServiceSLIs are only used on the regional detail dashboard and are not used in alerts.

This was discovered in gitlab-com/gl-infra/scalability#3398 (comment 1888235820)

fix: only generate serviceRegionalSLIs when needed

If a service doesn't have any regional SLIs, we don't need to generate the recording rules for it.

Edited by Bob Van Landuyt

Merge request reports