Generates error ratios in a consistent manner

This is a small change extracted from !3068 (merged), to reduce the complexity of that MR.

Why?

GitLab's SLI monitoring uses error ratios for monitoring errors, that is "What is the ratio of requests in error to all requests".

In different places, this is calculated differently.

In some places, we preaggregate the number of errors and the number of operations into recording rules and then divide those numbers by one another.

  - record: gitlab_component_errors:rate
    expr: |
      sum by (env, environment, tier, type, stage, component) (
        gitlab_component_errors:rate{monitor!="global"}
      )
  - record: gitlab_component_ops:rate
    expr: |
      sum by (env, environment, tier, type, stage, component) (
          gitlab_component_ops:rate{monitor!="global"}
      )
  - record: gitlab_component_errors:ratio
    expr: |
      gitlab_component_errors:rate{monitor="global"}
      /
      gitlab_component_ops:rate{monitor="global"}

In other places, the recording rule is calculated directly, for example:

  - record: gitlab_service_errors:ratio
    expr: |
      sum by (environment, env, tier, type, stage) (gitlab_component_errors:rate{monitor!="global"} >= 0)
      /
      sum by (environment, env, tier, type, stage) (gitlab_component_ops:rate{monitor!="global"} > 0)

This makes all the Error Ratio recording rules consistent, and is yakshaving required for !3068 (merged).

Outcomes

A few extra recording rules are added, but none are removed, and the change has no functional differences from before.

Edited by Andrew Newdigate

Merge request reports

Loading