Set priorityClass on prometheus pods and change memory/retention period
What does this MR do and why?
Based on issue #3068 (closed), during rolling update process, prometheus pods reports OOM-killed while replaying WAL. In order to overcome this situation we should adapt the configuration, the aim of this MR is to decrease the retention period to 5d ( it will be the default value but it's up to end users to change it, if they want that) and set priorityClass with low priority on prometheus pod (see discussion: !5945 (comment 2853871467))
Setting a priorityClass on a pod without limits can be a benefit but also could affect the node:
- Without limits prometheus pod is able to use as much memory as possible on the node to replay its WAL
- In case of memory pressure reported by kubelet having a priorityClass with a negative value means that prometheus pod should be evicted first and rescheduled on another node with enough resources
cc: @feleouet @cristian.manda @stoub @alinhg
Related reference(s)
Test coverage
CI configuration
Below you can choose test deployment variants to run in this MR's CI.
Click to open to CI configuration
Legend:
| Icon | Meaning | Available values |
|---|---|---|
| Infra Provider |
capd, capo, capm3
|
|
| Bootstrap Provider |
kubeadm (alias kadm), rke2, okd, ck8s
|
|
| Node OS |
ubuntu, suse, na, leapmicro
|
|
| Deployment Options |
light-deploy, dev-sources, ha, misc, maxsurge-0, logging, no-logging, cilium
|
|
| Pipeline Scenarios | Available scenario list and description | |
| Enabled units | Any available units name, by default apply to management and workload cluster. Can be prefixed by mgmt: or wkld: to be applied only to a specific cluster type |
-
🎬 preview☁️ capd🚀 kadm🐧 ubuntu -
🎬 preview☁️ capo🚀 rke2🐧 suse -
🎬 preview☁️ capm3🚀 rke2🐧 ubuntu -
☁️ capd🚀 kadm🛠️ light-deploy🐧 ubuntu -
☁️ capd🚀 rke2🛠️ light-deploy🐧 suse -
☁️ capo🚀 rke2🐧 suse -
☁️ capo🚀 rke2🐧 leapmicro -
☁️ capo🚀 kadm🐧 ubuntu -
☁️ capo🚀 kadm🐧 ubuntu🟢 neuvector,mgmt:harbor -
☁️ capo🚀 rke2🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capo🚀 kadm🎬 wkld-k8s-upgrade🐧 ubuntu -
☁️ capo🚀 rke2🎬 rolling-update-no-wkld🛠️ ha🐧 suse -
☁️ capo🚀 rke2🎬 sylva-upgrade-from-1.5.x🛠️ ha🐧 ubuntu -
☁️ capo🚀 rke2🎬 sylva-upgrade-from-1.5.x🛠️ ha,misc🐧 ubuntu -
☁️ capo🚀 rke2🛠️ ha,misc🐧 ubuntu -
☁️ capo🚀 rke2🛠️ ha,misc🐧 suse -
☁️ capm3🚀 rke2🐧 suse -
☁️ capm3🚀 kadm🐧 ubuntu -
☁️ capm3🚀 ck8s🐧 ubuntu -
☁️ capm3🚀 kadm🎬 rolling-update-no-wkld🛠️ ha,misc🐧 ubuntu -
☁️ capm3🚀 rke2🎬 wkld-k8s-upgrade🛠️ ha🐧 suse -
☁️ capm3🚀 kadm🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capm3🚀 rke2🎬 sylva-upgrade-from-1.5.x🛠️ ha🐧 suse -
☁️ capm3🚀 rke2🛠️ misc,ha🐧 suse -
☁️ capm3🚀 rke2🎬 sylva-upgrade-from-1.5.x🛠️ ha,misc🐧 suse -
☁️ capm3🚀 kadm🎬 rolling-update🛠️ ha🐧 suse -
☁️ capm3🚀 ck8s🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capm3🚀 rke2|okd🎬 no-update🐧 ubuntu|na
Global config for deployment pipelines
-
autorun pipelines -
allow failure on pipelines -
record sylvactl events
Notes:
- Enabling
autorunwill make deployment pipelines to be run automatically without human interaction - Disabling
allow failurewill make deployment pipelines mandatory for pipeline success. - if both
autorunandallow failureare disabled, deployment pipelines will need manual triggering but will be blocking the pipeline
Be aware: after configuration change, pipeline is not triggered automatically.
Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.