Set priorityClass on prometheus pods and change memory/retention period (!5957) · Merge requests · Sylva-projects / sylva-core

What does this MR do and why?

Based on issue #3068 (closed), during rolling update process, prometheus pods reports OOM-killed while replaying WAL. In order to overcome this situation we should adapt the configuration, the aim of this MR is to decrease the retention period to 5d ( it will be the default value but it's up to end users to change it, if they want that) and set priorityClass with low priority on prometheus pod (see discussion: !5945 (comment 2853871467))

Setting a priorityClass on a pod without limits can be a benefit but also could affect the node:

Without limits prometheus pod is able to use as much memory as possible on the node to replay its WAL
In case of memory pressure reported by kubelet having a priorityClass with a negative value means that prometheus pod should be evicted first and rescheduled on another node with enough resources

cc: @feleouet @cristian.manda @stoub @alinhg

Related reference(s)

Test coverage

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon	Meaning	Available values
☁️	Infra Provider	`capd`, `capo`, `capm3`
🚀	Bootstrap Provider	`kubeadm` (alias `kadm`), `rke2`, `okd`, `ck8s`
🐧	Node OS	`ubuntu`, `suse`, `na`, `leapmicro`
🛠️	Deployment Options	`light-deploy`, `dev-sources`, `ha`, `misc`, `maxsurge-0`, `logging`, `no-logging`, `cilium`
🎬	Pipeline Scenarios	Available scenario list and description
🟢	Enabled units	Any available units name, by default apply to management and workload cluster. Can be prefixed by `mgmt:` or `wkld:` to be applied only to a specific cluster type

Global config for deployment pipelines

autorun pipelines
allow failure on pipelines
record sylvactl events

Notes:

Enabling autorun will make deployment pipelines to be run automatically without human interaction
Disabling allow failure will make deployment pipelines mandatory for pipeline success.
if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited Oct 30, 2025 by Bogdan Antohe

Set priorityClass on prometheus pods and change memory/retention period

What does this MR do and why?

Related reference(s)

Test coverage

CI configuration

Global config for deployment pipelines

Merge request reports