More CPU for prometheus gprd/regional (!340) · Merge requests · GitLab.com / GitLab Infrastructure Team / Kubernetes Workloads / GitLab Helmfiles · GitLab

There is a correlation between CPU throttling of these pods, and slow rule evaluation peaks. See gitlab-com/gl-infra/production#3853 (closed) for context.

Note that as per gitlab-com/gl-infra/production#3853 (comment 522203297), these pods are and will continue to be scheduled on the sidekiq-urgent-other pool, not the default pool as their preferred (but not required) affinity would suggest. The nodes in this pool are the only ones that can accomodate such large pods: https://console.cloud.google.com/kubernetes/clusters/details/us-east1/gprd-gitlab-gke/nodes?project=gitlab-production

We should address this problem separately.

Edited Mar 04, 2021 by Craig Furman