fix: add k8s labels back to kube-state-metrics (!508) · Merge requests · GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab Helmfiles

Steve Xuereb requested to merge steveazz/add-label-allow-list into master Oct 20, 2021

Background

When rolling out the prometheus helm chart upgrade in gprd we started seeing some metrics disappear. As part of the upgrade we are updating kube-state-metrics from v1.9.7 to v2.2.0.

This is because we have a recording rule kube_ingress_labels:labeled that depends on kube_ingress_labels having kubernetes labels as part of the metric label. As pointed out by Ahmad this was changed in https://github.com/kubernetes/kube-state-metrics/pull/1125 so this ended up removing some labels from our metrics which we depend on such as lbel_stage and label_tier.

Solution

Define kube-state-metrics.metricLabelsAllowlist where you specify the resource that you want and which labels.

For example, if you define deployments=[INeedThisLabel] it will add the INeedThisLabel to the metric label. Using [*] means it will add every label.

The full list of resources can be found in https://github.com/kubernetes/kube-state-metrics/blob/b730cb415234509e6a1425c79e826f2e7688d27b/internal/store/builder.go#L222-L252.

The list of resources was picked by looking at the usage of these metrics inside of our runbooks using ripgrep we can grep for kube_.*_labels where .* is for the resource. Then when we add the resources that we wanted we can filter them out to see if we missed anything runbooks master rg 'kube_.*_labels' | rg -v -e 'gitlab:kube_node_pool_label' -e 'pod' -e 'deployment' -e 'ingress' -e 'node' -e 'hpa'. Note that gitlab:kube_node_pool_label is a recording rule and not something kube-state-metrics exposes

Testing

You can test these locally in a minikube cluster helmfile -e minikube apply

Find the IP of kube-state-metrics with kubectl -n monitoring get svc gitlab-monitoring-kube-state-metrics then run the following curl requests and make sure the label_* is present.

$ curl -s 10.101.180.95:8080/metrics | grep 'kube_ingress_labels'
$ curl -s 10.101.180.95:8080/metrics | grep 'kube_node_labels'
$ curl -s 10.101.180.95:8080/metrics | grep 'kube_pod_labels'
$ curl -s 10.101.180.95:8080/metrics | grep 'kube_deployment_labels'
$ curl -s 10.101.180.95:8080/metrics | grep 'kube_horizontalpodautoscaler_labels'

Thanos links to check on pre when this is deployed:

Some other dashboard that we should look at:

We shoud see this metric coming back: https://dashboards.gitlab.net/d/api-main/api-overview?viewPanel=2358192786&orgId=1&var-PROMETHEUS_DS=Global&var-environment=pre&var-stage=main https://thanos.gitlab.net/graph?g0.expr=avg_over_time(gitlab_component_ops%3Arate_5m%7Bcomponent%3D%22nginx_ingress%22%2Cenv%3D%22pre%22%2Cenvironment%3D%22pre%22%2Cmonitor%3D%22global%22%2Cstage%3D%22main%22%2Ctype%3D%22api%22%7D%5B1m%5D)&g0.tab=0&g0.stacked=0&g0.range_input=8w&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D&g0.end_input=2021-10-21%2008%3A50%3A29&g0.moment_input=2021-10-21%2008%3A50%3A29

To compare you can update the query to {env="grpd"} so that you can compare to what we have in gprd (known good)

reference https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/13973

Edited Oct 21, 2021 by Steve Xuereb

fix: add k8s labels back to kube-state-metrics

Background

Solution

Testing

Merge request reports