feat: Remove kube_pool_cpu from get-hybrid saturation metrics (!7412) · Merge requests · GitLab.com / Runbooks

Craig Miskell requested to merge gh-remove-kube-pool-cpu into master May 28, 2024

What

Remove kube_pool_cpu from get-hybrid saturation metrics. The limits-based alerting is still useful and desirable, so I've kept that by splitting the libsonnet into two files, one for each. The .com metrics catalog uses both (thus no effective change there), but in the get-hybrid only the limits alerting is used.

Why

In short, "requests" does not offers a great denominator for CPU saturation in get-hybrid deployments, particularly with mixed workloads on the support node group pool. Those needs are better served by monitoring overall saturation on a per-node-group basis, which we can do separately (either with kube_pool_cpu, or something more custom which is what Dedicated is likely to need). Limits should still be alerted on in case there are actual CPU limits, although in practice get-hybrid doesn't have many of those by default.

Longer internal discussion at https://gitlab.com/gitlab-com/gl-infra/capacity-planning-trackers/gitlab-dedicated/-/issues/115#note_1924040870

Edited May 29, 2024 by Craig Miskell

feat: Remove kube_pool_cpu from get-hybrid saturation metrics

What

Why

Merge request reports