Update Prometheus rules using kubelet and kube-state-metrics metrics to be evaluated by Thanos instead

The partitioned Prometheus instances scrape the kubelet and kube-state-metrics endpoints (and some others), and rightly so it shouldn't, because it would result in duplicated metrics in Thanos. But because of this, the rules combining those metrics with metrics from the targeted ServiceMonitors (for example gitlab_component_saturation:ratio for the component kube_go_memory fail to evaluate, resulting in empty dashboards panels and missed alerts.

A possible solution would be to move those rules to thanos-rule instead. One concern about this is the added load on those instances, which could possibly be solved by migrating it to Kubernetes with some autoscaling: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/10969

Edited Jul 21, 2022 by Pierre Guinoiseau