fix: allow global SLIs to be aggregated

Part of https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/14644.

Follow on from !5405 (merged).


When !5405 (merged) was merged, I discovered that the following SLIs were not being evaluated for the new Thanos service:

  1. Component Error Ratio
  2. Component Apdex Ratio
  3. Service Error Ratio
  4. Service Apdex Ratio
  5. Service Ops Rate

screenshot-andrewn-2023-02-15T11h46Z_2x

source

The reason for this is that the underlying aggregation sets use the {monitor!="global"} selector to fetch the underlying data from Prometheus.

However, in the case of Thanos, this data is evaluated only in Thanos.

We therefore need a means to evaluate these SLIs in Thanos, using aggregation sets recorded in Thanos.

This change allows dangerouslyThanosEvaluated to be evaluated in Thanos using Thanos recording rules. This is not done for other SLIs, only ones marked with dangerouslyThanosEvaluated.

Hopefully the missing SLIs should resolve once this fix is merged.

Edited by Andrew Newdigate

Merge request reports

Loading