Use requested cpu resources for capacity planning
Requests in kube_container_cpu capacity planning
This changes the kube_container_cpu
saturation point to use HPA's
requests as the denominator. This will show us if we configured the
requests well, based on the utilization.
For capacity planning we use the 99th quantile of all containers in a service over an hour. This should flatten out short peaks of utilization which we account for using the configured limit and throttling. We don't care about these peaks for capacity planning. For alerting, we don't care about this saturation point: we don't want to alert when a container briefly uses more resources than requested. So alerting is disabled for this saturation point.
We do keep the old saturation point around for alerting, but exclude it for capacity planning. This allows us to still alert when a container is over-utilizing CPU for too long.
This was discussed in https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/17061
And implements the CPU portion for gitlab-com/gl-infra&946