Adjust CPU requests for registry, sidekiq and gitlab-shell deployments
On some of our deployments we often see much higher CPU usage than the configured requests, particularly for registry, gitlab-shell and some sidekiq shards.
This came up when working on cpu_shares saturation metrics: gitlab-com/runbooks!3642 (closed)
This is problematic for several reasons:
- We are at risk to run out of resources when other containers are claiming the CPU that those pods use in excess of their requests.
- We might deploy more pods on a single node than is desirable.
- For some deployments we don't set CPU limits, so we need to calculate CPU saturation based on CPU requests, which isn't working well if we constantly go over our requests.
- In some cases we have an avg HPA CPU target set higher then our requests, which doesn't make sense.
We should look into adjusting those settings:
- registry
- HPA avg cpu target: 80% of requests
- we often have single containers going above 100% requests - we probably should lower the target to 70% and raise requests. Currently CPU requests is at 250m. With that we have around 10 pods running on each of the registry nodes, with some room on the nodes (also for memory). How about going to 300m? Registry is written in go and should scale quite well with nr of cores. Thanos
- git (only gitlab-shell is over saturation)
- HPA avg cpu target: 800m
- requests: 1000m
- several gitlab-shell containers regularly go beyond 1000m CPU usage, usually up to 1900m, in rare cases up to 2800m (Thanos). I'd suggest to increase CPU requests. At the moment we are limited more by memory than CPU. But going to, let's say, 1800m CPU requests probably would allow less pods per node, which needs some calculation for node pool capacity.
- sidekiq
- here we need to look into each shard, but generally we have very tiny cpu requests compared to the avg target cpu, so we should raise requests to a value slightly higher than the target. Thanos
Update 2021-06-15
- MR for adjusting registry CPU requests in staging and cny: gitlab-com/gl-infra/k8s-workloads/gitlab-com!940 (merged)
- MR for adjusting gitlab-shell CPU requests and average target in zone b: gitlab-com/gl-infra/k8s-workloads/gitlab-com!941 (merged)
Edited by Henri Philipps