[prometheus] gstg - Prometheus resource audit
What and Why
Issue: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/24579
Common changes and best practices
- Add memory limits. No having a memory limit in kubernetes increases the chance of being killed by the OOMKiller.
- Memory Request = Memory Limit - Best practice, again reduce impact of OOMKiller for these workloads.
- Enabled feature "memory-snapshot-on-shutdown" Requires Prometheus >= 2.30. Prometheus writes out a more raw snapshot of its current in-memory state upon shutdown, which can then be re-read into memory more efficiently when the server restarts. Reduce starts time in 50-80%. Faster restarts https://github.com/prometheus/prometheus/pull/7229
Summary of changes per environment and instance
Env | Cluster env | Prometheus instance | Memory requests | Memory limits | CPU requests | CPU Limits | Dashboard |
---|---|---|---|---|---|---|---|
gstg | gstg-gilab-gke | gitlab-monitoring-promethe-prometheus | 10Gi -> 15Gi | none -> 15Gi | 1500m | none | Link |
gstg | gstg-us-east1-b | gitlab-monitoring-promethe-prometheus | 10Gi | none -> 10Gi | 1500m | none | Link |
gstg | gstg-us-east1-c | gitlab-monitoring-promethe-prometheus | 10Gi | none -> 10Gi | 1500m | none | Link |
gstg | gstg-us-east1-d | gitlab-monitoring-promethe-prometheus | 10Gi | none -> 10Gi | 1500m | none | Link |
gstg | gstg-gilab-gke | gitlab-rw-prometheus | 10Gi -> 15Gi | none -> 15Gi | 1500m | none | Link |
gstg | gstg-us-east1-b | gitlab-rw-prometheus | 10 Gi -> 7Gi | none -> 7Gi | 1500m | none | Link |
gstg | gstg-us-east1-c | gitlab-rw-prometheus | 10 Gi -> 7Gi | none -> 7Gi | 1500m | none | Link |
gstg | gstg-us-east1-d | gitlab-rw-prometheus | 10 Gi -> 7Gi | none -> 7Gi | 1500m | none | Link |
See memory usage last 7days here
About memory-snapshot-on-shutdown
- Faster restarts, quicker recovery. It also works when the different probes fails for different reasons and Prometheus is SIGTERM.
- Snapshots will take additional space.
- Depending on how many series you have and the write speed of the disk, shutdown can take a little time. Therefore, we need to adjust pod termination grace period but, that is a setting that still is not supported in the Prometheus Operator https://github.com/prometheus-operator/prometheus-operator/issues/3433. At the moment, is hardcode to 10m (600s) which I believe will be more than enough for this use case. As I rolling out the setting, will test this assumption.
Edited by Raúl Naveiras