Draft: Add customizable Longhorn default StorageClass via new unit to configure staleReplicaTimeout (!4672) · Merge requests · Sylva-projects / sylva-core

Background

In Sylva bare metal environments, we set the replicaReplenishmentWaitInterval in Longhorn to 3600 seconds (1 hour). This value is meant to provide enough time for a temporarily unavailable node (due to reboots, upgrades, etc.) to come back online and retain its volume replicas, avoiding unnecessary rebuilds.

However, in practice, Longhorn starts rebuilding the replicas after 30 minutes of node absence — well before the 1-hour interval expires. This premature rebuild behavior was traced to the staleReplicaTimeout parameter in the default Longhorn StorageClass, which is hardcoded to 30 minutes and not configurable through the upstream Longhorn Helm chart.

reference:- https://github.com/longhorn/longhorn/blob/3bd56d3b80c6dc8e35480c134872d64e496997cb/chart/templates/storageclass.yaml#L21C7-L21C26

What does this MR do and why?

This MR introduces a new Sylva unit:

longhorn-storageclass-default

Purpose:

To provide a configurable Longhorn StorageClass with improved alignment replica recovery strategy.

Key Changes:

Adds a new unit that defines a custom StorageClass for Longhorn.
Sets staleReplicaTimeout: "60" (minutes) to match the replicaReplenishmentWaitInterval of 1 hour.
Retains all default parameters (fsType, dataLocality, etc.) to maintain parity with Longhorn’s default behavior.
Makes it easier to override or extend additional StorageClass parameters in the future if needed.

Additional Notes:

Disabled the creation of the default StorageClass in the Longhorn Helm chart post-render step.
longhorn-storageclass-default is now used to manage the default StorageClass explicitly.
The unit is now enabled and applied across both management and workload clusters.

Closes:- #2520 (closed)

Related reference(s)

Test coverage

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon	Meaning	Available values
☁️	Infra Provider	`capd`, `capo`, `capm3`
🚀	Bootstrap Provider	`kubeadm` (alias `kadm`), `rke2`
🐧	Node OS	`ubuntu`, `suse`
🛠️	Deployment Options	`light-deploy`, `dev-sources`, `ha`, `misc`, `maxsurge-0`, `logging`, `no-logging`
🎬	Pipeline Scenarios	Available scenario list and description

☁️ capm3 🚀 rke2 🐧 suse
☁️ capm3 🚀 kadm 🐧 ubuntu
☁️ capm3 🚀 rke2 🎬 sylva-upgrade-from-1.4.x 🛠️ ha 🐧 suse

Global config for deployment pipelines

autorun pipelines
allow failure on pipelines
record sylvactl events

Notes:

Enabling autorun will make deployment pipelines to be run automatically without human interaction
Disabling allow failure will make deployment pipelines mandatory for pipeline success.
if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited Aug 25, 2025 by Ravindra Tanwar

Draft: Add customizable Longhorn default StorageClass via new unit to configure staleReplicaTimeout