Skip to content

fix: add k8s labels back to kube-state-metrics

Steve Xuereb requested to merge steveazz/add-label-allow-list into master

Background

When rolling out the prometheus helm chart upgrade in gprd we started seeing some metrics disappear. As part of the upgrade we are updating kube-state-metrics from v1.9.7 to v2.2.0.

This is because we have a recording rule kube_ingress_labels:labeled that depends on kube_ingress_labels having kubernetes labels as part of the metric label. As pointed out by Ahmad this was changed in https://github.com/kubernetes/kube-state-metrics/pull/1125 so this ended up removing some labels from our metrics which we depend on such as lbel_stage and label_tier.

Solution

Define kube-state-metrics.metricLabelsAllowlist where you specify the resource that you want and which labels.

For example, if you define deployments=[INeedThisLabel] it will add the INeedThisLabel to the metric label. Using [*] means it will add every label.

The full list of resources can be found in https://github.com/kubernetes/kube-state-metrics/blob/b730cb415234509e6a1425c79e826f2e7688d27b/internal/store/builder.go#L222-L252.

The list of resources was picked by looking at the usage of these metrics inside of our runbooks using ripgrep we can grep for kube_.*_labels where .* is for the resource. Then when we add the resources that we wanted we can filter them out to see if we missed anything runbooks master rg 'kube_.*_labels' | rg -v -e 'gitlab:kube_node_pool_label' -e 'pod' -e 'deployment' -e 'ingress' -e 'node' -e 'hpa'. Note that gitlab:kube_node_pool_label is a recording rule and not something kube-state-metrics exposes

Testing

You can test these locally in a minikube cluster helmfile -e minikube apply

Find the IP of kube-state-metrics with kubectl -n monitoring get svc gitlab-monitoring-kube-state-metrics then run the following curl requests and make sure the label_* is present.

$ curl -s 10.101.180.95:8080/metrics | grep 'kube_ingress_labels'
$ curl -s 10.101.180.95:8080/metrics | grep 'kube_node_labels'
$ curl -s 10.101.180.95:8080/metrics | grep 'kube_pod_labels'
$ curl -s 10.101.180.95:8080/metrics | grep 'kube_deployment_labels'
$ curl -s 10.101.180.95:8080/metrics | grep 'kube_horizontalpodautoscaler_labels'

Thanos links to check on pre when this is deployed:

Some other dashboard that we should look at:

To compare you can update the query to {env="grpd"} so that you can compare to what we have in gprd (known good)

reference https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/13973

Edited by Steve Xuereb

Merge request reports