Skip to content

Use requested cpu resources for capacity planning

Bob Van Landuyt requested to merge bvl/kube-cpu-requests-capacity-planning into master

Requests in kube_container_cpu capacity planning

This changes the kube_container_cpu saturation point to use HPA's requests as the denominator. This will show us if we configured the requests well, based on the utilization.

For capacity planning we use the 99th quantile of all containers in a service over an hour. This should flatten out short peaks of utilization which we account for using the configured limit and throttling. We don't care about these peaks for capacity planning. For alerting, we don't care about this saturation point: we don't want to alert when a container briefly uses more resources than requested. So alerting is disabled for this saturation point.

We do keep the old saturation point around for alerting, but exclude it for capacity planning. This allows us to still alert when a container is over-utilizing CPU for too long.

This was discussed in https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/17061

And implements the CPU portion for gitlab-com/gl-infra&946

Edited by Bob Van Landuyt

Merge request reports