Avoid recording SLI metrics twice with different labels

In !3471 (merged) we've added an extra set of recording rules for SLIs defined on services that had a feature category set.

We did this to avoid adding a feature category label to the "key metrics" (named gitlab_componet_*) because in a first attempt (!3452 (merged)), this lead to anomaly detection alerts because the separate recordings (with and without feature category) would be summed during the rollout making it look like extra traffic while it's not.

This is the case for service_ops_anomaly_detection.yml, which triggered the alerts that caused us to revert the original MR. This uses the promSourceSLI recordings, without actually using the aggregation set and the labels. These are then used for anomaly alerts and dashboards.

This is the only one I found at first, because it shouted at us. But I think there could be more of these.


The following discussion from !3471 (merged) should be addressed:

  • @reprazent started a discussion: (+2 comments)

    @smcgivern This is a second take of the previous MR you reviewed. The new approach doesn't add the label to the "key metrics", avoiding the service-anomaly alerts we saw with the last MR. Instead, it records metrics that look like the ones we have for puma. So they would get aggregated up into thanos and the group-level metrics the same way. It is inspired by what @andrewn pointed out in !3458 (comment 555065444), where he suggests to use the per-prometheus metrics for thanos recordings rather than chaining them together. We could revert !3458 (merged), but for our current purposes, we don't need to.