Filter generated recording rules by tenants

Spawned from #2582 (comment 1744861810)

Not all saturation points should have recording rules for all tenants. The cloudflare_data_transfer utilization metric, for example, is only relevant on tenant gitlab-ops.

We could limit which tenants the saturation points belongs to such as:

diff --git a/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet b/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet
index 351c4bc56..fd2b8cad8 100644
--- a/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet
+++ b/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet
@@ -6,6 +6,7 @@ local utilizationMetric = metricsCatalog.utilizationMetric;
     title: 'Cloudflare Network Total Data Transfer',
     unit: 'bytes',
     appliesTo: ['cloudflare'],
+    tenants: ['gitlab-ops'],
     description: |||
       Tracks total data transfer across the cloudflare network
     |||,

Or, as another suggestion, at the service level. Similarly as above:

diff --git a/metrics-catalog/services/thanos.jsonnet b/metrics-catalog/services/thanos.jsonnet
index a1a24d18f..9ce5d73b1 100644
--- a/metrics-catalog/services/thanos.jsonnet
+++ b/metrics-catalog/services/thanos.jsonnet
@@ -23,6 +23,7 @@ local thanosServiceSelector = { type: 'thanos', namespace: 'thanos' };
 metricsCatalog.serviceDefinition({
   type: 'thanos',
   tier: 'inf',
+  tenants: ['gitlab-ops'],

   tags: ['golang', 'thanos'],

Worth noting that some saturation points may be referencing services that do not exist in the service catalog. For example, the cloudflare_data_transfer from utilization metrics, is pointing to a service cloudflare.

Context:

@hmerscher:

As @marcogreg noticed in this comment gitlab-com/runbooks!6746 (comment 1744620970), some utilization metrics are gathering data only on a specific environment. Should we somehow target the environment(s) a metric will be used to generate recording rules to? We could have something like this, maybe?

diff --git a/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet b/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet
index 351c4bc56..fd2b8cad8 100644
--- a/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet
+++ b/metrics-catalog/utilization/cloudflare_data_transfer.libsonnet
@@ -6,6 +6,7 @@ local utilizationMetric = metricsCatalog.utilizationMetric;
    title: 'Cloudflare Network Total Data Transfer',
    unit: 'bytes',
    appliesTo: ['cloudflare'],
+    environments: ['ops'],
    description: |||
      Tracks total data transfer across the cloudflare network
    |||,

And filter it out of environments not present in this list. I can easily give it a try in a follow up MR after merging this one gitlab-com/runbooks!6746 (merged).

@reprazent:

That sounds like a good idea! The same will likely go for saturation points, or for some services for that matter (ops-gitlab-net being one of them).

But I think we don't want to tie it to an environment, but instead tie it to tenants in separateMimirRecordingSelectors, so we'd have tenants: ['gitlab-ops'] for this one. We default to running something on all tenants unless otherwise specified.

I also think that we would more likely specify that at the service level, rather than the saturation-point or utilization definition. That way, if a saturation point appliesTo a service, we'll generate the rules in all tenants that this service is running in. Would that make sense?

The one you bring up is a bit awkward, because there is no such thing as a cloudflare service. But I think there should be: I think we run the cloudflare exporter, and we should be able to get at least an operation rate from it 😄. I think we should create a separate issue for that as a follow up to this project: for &1107 (closed) I think we'll live with the fact that a bunch of rule evaluations will be empty. What do you think @stejacks-gitlab