Draft: Add customizable Longhorn default StorageClass via new unit to configure staleReplicaTimeout
Background
In Sylva bare metal environments, we set the replicaReplenishmentWaitInterval in Longhorn to 3600 seconds (1 hour). This value is meant to provide enough time for a temporarily unavailable node (due to reboots, upgrades, etc.) to come back online and retain its volume replicas, avoiding unnecessary rebuilds.
However, in practice, Longhorn starts rebuilding the replicas after 30 minutes of node absence — well before the 1-hour interval expires. This premature rebuild behavior was traced to the staleReplicaTimeout parameter in the default Longhorn StorageClass, which is hardcoded to 30 minutes and not configurable through the upstream Longhorn Helm chart.
What does this MR do and why?
This MR introduces a new Sylva unit:
longhorn-storageclass-default
Purpose:
To provide a configurable Longhorn StorageClass with improved alignment replica recovery strategy.
Key Changes:
-
Adds a new unit that defines a custom StorageClass for Longhorn.
-
Sets
staleReplicaTimeout: "60"(minutes) to match thereplicaReplenishmentWaitIntervalof 1 hour. -
Retains all default parameters (
fsType,dataLocality, etc.) to maintain parity with Longhorn’s default behavior. -
Makes it easier to override or extend additional StorageClass parameters in the future if needed.
Additional Notes:
-
Disabled the creation of the default StorageClass in the Longhorn Helm chart post-render step.
-
longhorn-storageclass-default is now used to manage the default StorageClass explicitly.
-
The unit is now enabled and applied across both management and workload clusters.
Closes:- #2520 (closed)
Related reference(s)
Test coverage
CI configuration
Below you can choose test deployment variants to run in this MR's CI.
Click to open to CI configuration
Legend:
| Icon | Meaning | Available values |
|---|---|---|
| Infra Provider |
capd, capo, capm3
|
|
| Bootstrap Provider |
kubeadm (alias kadm), rke2
|
|
| Node OS |
ubuntu, suse
|
|
| Deployment Options |
light-deploy, dev-sources, ha, misc, maxsurge-0, logging, no-logging
|
|
| Pipeline Scenarios | Available scenario list and description |
-
☁️ capm3🚀 rke2🐧 suse -
☁️ capm3🚀 kadm🐧 ubuntu -
☁️ capm3🚀 rke2🎬 sylva-upgrade-from-1.4.x🛠️ ha🐧 suse
Global config for deployment pipelines
-
autorun pipelines -
allow failure on pipelines -
record sylvactl events
Notes:
- Enabling
autorunwill make deployment pipelines to be run automatically without human interaction - Disabling
allow failurewill make deployment pipelines mandatory for pipeline success. - if both
autorunandallow failureare disabled, deployment pipelines will need manual triggering but will be blocking the pipeline
Be aware: after configuration change, pipeline is not triggered automatically.
Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.