Add persistent storage to Thanos components (!4355) · Merge requests · Sylva-projects / sylva-core

Part of #2030 (closed)

2nd point

Persistent storage is desired not only for persiting data, but also to address the ephemeral storage exhaustion which leads to pod evistion and other issues. Described in #2175 (closed) also

Changing from persistence.enabled: false to true for Thanos components deployed as StetefulSets (ruler, storegateway, receiver) does not work:

thanos 44h False Helm upgrade failed for release thanos/thanos with chart thanos@15.8.0+e8c23ba5762e: failed to replace object: StatefulSet.apps "thanos-receive" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden

upgrade.force: true does not fix this as it's equivalent to helm upgrade --force and Kubernetes does not allow modifications to certain immutable fields of a StatefulSet

The MR implements:

enable persistence on storegateway and receive
add annotation to each StetafulSet containing short hash of persistence of each component
add an unit thanos-statefulsets-cleanup Job that checks if the values short hash of persistence matches the existing StatefulSet annotation

if they match -> leave the StatefulSet/Pods untouched as the existing workload is using the current values settings
if they differ -> delete the StatefulSet+Pods before the upgrade so they are recreated with the new settings

This Job should run every time as most changes in persistence (storageClass, accessModes, enabled, size) need the cleanup

Tested by switching from persistence.enabled: false to true and back for all components and changing the size when persistence is enabled

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon	Meaning	Available values
☁️	Infra Provider	`capd`, `capo`, `capm3`
🚀	Bootstrap Provider	`kubeadm` (alias `kadm`), `rke2`
🐧	Node OS	`ubuntu`, `suse`
🛠️	Deployment Options	`light-deploy`, `dev-sources`, `ha`, `misc`, `maxsurge-0`, `logging`
🎬	Pipeline Scenarios	Available scenario list and description

Global config for deployment pipelines

autorun pipelines
allow failure on pipelines
record sylvactl events

Notes:

Enabling autorun will make deployment pipelines to be run automatically without human interaction
Disabling allow failure will make deployment pipelines mandatory for pipeline success.
if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited May 23, 2025 by Alin H

Add persistent storage to Thanos components

CI configuration

Global config for deployment pipelines

Merge request reports