Transactionally record error- and apdex rates for aggregation sets
Record transactional rates from source metrics
This allows specifying errorRates
or apdexRates
on an
aggregation-set that will be used for aggregating from source metrics.
Doing this will record both sides of what will become a ratio in a
single recording rule with a recorded_rate
label added using
label_replace
signifying which rate is being recorded.
The recording rules would look as following for error rates:
record: 'source_error:rates_5m',
expr: |
label_replace(
sum by (a,b) (
rate(some_error_total_count{}[5m] offset 2s)
)
or
(
0 * sum by (a,b) (
rate(some_total_count{}[5m] offset 2s)
)
),
'recorded_rate', 'error_rate' , '', ''
)
or
label_replace(
sum by (a,b) (
rate(some_total_count{}[5m] offset 2s)
),
'recorded_rate', 'ops_rate' , '', ''
)
Note the fallback to 0 * ops_rate
in the error_rate
portion. This
is done to make sure that we record an error rate of 0, even when the
error rate is missing as long as the operation rate is present. We do
this because more often than not or error rates are not properly
initialized. If we didn't have this fallback, we wouldn't be able to
record an error ratio until we've seen an error.
The recording rule for apdex ratios looks as follows:
record: 'source_apdex:rates_5m',
expr: |
label_replace(
sum by (a,b) (
rate(some_apdex_success_total_count{}[5m] offset 2s)
),
'recorded_rate', 'success_rate' , '', ''
)
or
label_replace(
sum by (a,b) (
rate(some_apdex_total_count{}[5m] offset 2s)
),
'recorded_rate', 'apdex_weight' , '', ''
)
Here we aren't using the fallback because we expect most of our apdex measurements to be successful, meaning the metric would not often be missing.
Allow generating rule files in nested directories
This allows generating rules in subdirectories for different environments.
This is not yet supported by our current Thanos and prometheus setup,
that receive their rules through the syncinator. But the new thanos
setup, already in use for thanos-staging
, does.
It also updates the make generate
script to delete rule files in
subdirectories so they don't get left behind when regenerating.
[Thanos-staging] Transactional rates from source metrics
This adds experimental recording rules for recording aggregation-sets from source metrics in thanos staging.
The experimental aggregation set includes the new transactional rates besides the old rates we currently record in Prometheus. This is done for 4 services to start.
This does not yet allow recordin ratio's from the source metrics or in transformations. This will be done in gitlab-com/gl-infra/scalability#2475 (closed) later when we get rid of the intermediate recording rules in Prometheus.
[Thanos-staging] Record global aggregations with transactional rates
This takes transactional rates from the source aggregation we added in the previous commit and transforms it into the service aggregation in thanos-staging.
This does not yet include using these transactional rates in the ratio recordings.