Improve k8s CPU/Memory detection
We tried to turn on Prometheus monitoring for our cloud native Helm chart repo, and found that it didn't correct identify all of the pods.
Current problem
In looking at the issue, the problem lies in the regex we use. To provide an example, here is a pod name review-documentat-f4kyzn-redis-86f747b587-c6dfm
.
Right now in the Prometheus metric relabeling, we automatically trim off the two fields at the end as they are auto-generated by Kubernetes and store than in a new label. We then match on the new label against $CI_ENVIRONMENT_SLUG
.
The issue is that right now it is an exact match. For these charts, we have a service name attached to the name of the pod, in this case redis
. Presently this is being used as part of the environment slug, which it shouldn't be.
Proposed solution
What I would propose is we simply do a direct regex starts with query, and avoid all relabeling requirements. For example, we can just do a filter on any pod name that starts with $CI_ENVIRONMENT_SLUG
and have a much more reliable detection method.
To do this we would use a simple regex match on the label, for example =~^{$CI_ENVIRONMENT_SLUG}