Filter generated recording rules by tenants
Spawned from #2582 (comment 1744861810)
Not all saturation points should have recording rules for all tenants. The cloudflare_data_transfer
utilization metric, for example, is only relevant on tenant gitlab-ops
.
We could limit which tenants the saturation points belongs to such as:
diff --git a/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet b/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet
index 351c4bc56..fd2b8cad8 100644
--- a/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet
+++ b/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet
@@ -6,6 +6,7 @@ local utilizationMetric = metricsCatalog.utilizationMetric;
title: 'Cloudflare Network Total Data Transfer',
unit: 'bytes',
appliesTo: ['cloudflare'],
+ tenants: ['gitlab-ops'],
description: |||
Tracks total data transfer across the cloudflare network
|||,
Or, as another suggestion, at the service level. Similarly as above:
diff --git a/metrics-catalog/services/thanos.jsonnet b/metrics-catalog/services/thanos.jsonnet
index a1a24d18f..9ce5d73b1 100644
--- a/metrics-catalog/services/thanos.jsonnet
+++ b/metrics-catalog/services/thanos.jsonnet
@@ -23,6 +23,7 @@ local thanosServiceSelector = { type: 'thanos', namespace: 'thanos' };
metricsCatalog.serviceDefinition({
type: 'thanos',
tier: 'inf',
+ tenants: ['gitlab-ops'],
tags: ['golang', 'thanos'],
Worth noting that some saturation points may be referencing services that do not exist in the service catalog. For example, the cloudflare_data_transfer from utilization metrics, is pointing to a service cloudflare
.
Context:
As @marcogreg noticed in this comment gitlab-com/runbooks!6746 (comment 1744620970), some utilization metrics are gathering data only on a specific environment. Should we somehow target the environment(s) a metric will be used to generate recording rules to? We could have something like this, maybe?
diff --git a/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet b/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet
index 351c4bc56..fd2b8cad8 100644
--- a/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet
+++ b/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet
@@ -6,6 +6,7 @@ local utilizationMetric = metricsCatalog.utilizationMetric;
title: 'Cloudflare Network Total Data Transfer',
unit: 'bytes',
appliesTo: ['cloudflare'],
+ environments: ['ops'],
description: |||
Tracks total data transfer across the cloudflare network
|||,
And filter it out of environments not present in this list. I can easily give it a try in a follow up MR after merging this one gitlab-com/runbooks!6746 (merged).
That sounds like a good idea! The same will likely go for saturation points, or for some services for that matter (
ops-gitlab-net
being one of them).
But I think we don't want to tie it to an environment, but instead tie it to tenants in
separateMimirRecordingSelectors
, so we'd havetenants: ['gitlab-ops']
for this one. We default to running something on all tenants unless otherwise specified.
I also think that we would more likely specify that at the service level, rather than the saturation-point or utilization definition. That way, if a saturation point
appliesTo
a service, we'll generate the rules in all tenants that this service is running in. Would that make sense?
The one you bring up is a bit awkward, because there is no such thing as a
cloudflare
service. But I think there should be: I think we run the cloudflare exporter, and we should be able to get at least an operation rate from it😄 . I think we should create a separate issue for that as a follow up to this project: for &1107 (closed) I think we'll live with the fact that a bunch of rule evaluations will be empty. What do you think @stejacks-gitlab