Skip to content

feat: Add recording rule duration SLI

Bob Van Landuyt requested to merge bvl-recording-rule-group-sli into master

feat: add errorRateApdex

An errorRateApdex will allow us to define SLIs using an error & a total counter. It will translate into an SLI similar to our other apdexes (histogram & success-rate) that have an ideal ratio of 100%.

feat: Add recording rule duration SLI

This adds a new SLI to both Thanos and Monitoring (for Prometheus). This SLI keeps an eye on rule-group durations: every time a duration exceeds it's interval will be counted as an error, meaning we need to work on improving the duration of that rule group.

We're recording this as an apdex, because this SLI talks about the latency (duration) of a recording rule group.

For gitlab-com/gl-infra/scalability#2204 (closed)

Current state of the new SLI:

Edited by Bob Van Landuyt

Merge request reports