feat: Add and use a nodegroup CPU saturation metric for GET CNH metrics (!7507) · Merge requests · GitLab.com / Runbooks

Craig Miskell requested to merge ra-hybrid-kube-service into master Jun 14, 2024

Add and use a nodegroup CPU saturation metric for GET CNH metrics

kube_pool_cpu is oriented to .com deployments where all services have dedicated node pools, epitomised by the "appliesTo" expression of "metricsCatalog.findKubeProvisionedServicesWithDedicatedNodePool". This is not useful for GET CNH deploys which have multi-use node groups (do not always have Dedicated Node Pools).

Additionally, GET metrics do not necessarily have the "node" label, so the use of node_cpu_seconds_total:labeled is not feasible to get a type-labelled metric. However, Dedicated (the main user of GET CNH metrics catalog) scrapes node metrics with a "type" label that records the node group name, in a format ending in "pool". Therefore this adds:

A "kube" service for get-hybrid, copied from the .com equivalent. Includes the apiserver SLI just because it seems plausibly interesting.
A new node_group_cpu saturation metric based on the GET/Dedicated labelling scheme, and for all node groups even running mixed workloads, that applies to the kube service.

It also co-incidentally adds alerting for the recently added PVC saturation, because now there is a "kube" service to link to (this was inadvertently missed when adding it).o

feat: Add and use a nodegroup CPU saturation metric for GET CNH metrics

Merge request reports