Skip to content

Separate gprd recordings from non-prod recording rule groups in Thanos

The ServiceThanos provides a single pane of glass for all of our metrics at GitLab. This means that metrics from several environments are collected there.

Some recording rules also use the env & environment labels in their aggregation: recording metrics across several environments in a single rule group. In other words, they record from several Prometheii (monitor: 'default') to have a global view inside a single recording rule. This means we cannot have partial_response_strategy: abort on those rules: if a Prometheus instance in gstg would not respond, we'd also skip recording metrics from gprd.

In this issue, we should generate separate recording rule groups for production & non-production metrics. The following aggregations have been identified to aggregate across environments:

Edited by Marco Gregorius