Added test for detecting self cluster unplanned node rolling updates
What does this MR do and why?
Summary
Add detection for unplanned Machine rolling updates that occur when a cluster's own update (management or workload) triggers unwanted Machine recreation due to configuration changes.
Problem
Current detection only covers cross-cluster scenarios (!4705 (merged)) but misses self-cluster rolling updates. When updating a cluster, changes to metallb, calico, or coredns can trigger unplanned Machine rolling updates within the same cluster. The existing cluster-machines-ready reconciliation timestamp approach doesn't work for self-updates because if unwanted Machines are created, the reconciliation waits for them and completes after they exist, making the timestamp useless as a reference.
Solution Implement
The solution uses a timestamp comparison approach to detect unplanned machine rolling updates during cluster updates.
- Capture Update Start Time of Management and Workload Cluster
- Compare Machine Creation Times with the upgrade jobs
- Machine creationTimestamp > UPGRADE_STARTED_AT → New machine (unplanned rolling update)
- Machine creationTimestamp ≤ UPGRADE_STARTED_AT → Existing machine (expected)
To achieve this:
- Created Reusable Base Templates
-
.get-upgrade-job-timestamp::
- Extracts upgrade job start timestamp via GitLab API
- Uses UPGRADE_JOB_NAME variable for flexibility
- Exports UPGRADE_STARTED_AT for downstream use
-
.check-rolling-updates-base:
- Common script logic for checking machine rolling updates
- Dynamic namespace handling: sylva-system for management cluster, $ENV_NAME for workload cluster
- Detects machines created after upgrade started
2 New Jobs Added
mgmt-detect-self-rolling-updateswkld-detect-self-rolling-updates
Passed Pipeline:
- When rolling not expected(simple upgrade):
- When rolling is expected (default is to skip the test) , just to make sure if test is failing as expected:
Related reference(s)
- #2218 (closed)
- Complements !4705 (merged) (cross-cluster detection)
- Closes #2853 (closed)
Test coverage
CI configuration
Below you can choose test deployment variants to run in this MR's CI.
Click to open to CI configuration
Legend:
| Icon | Meaning | Available values |
|---|---|---|
| Infra Provider |
capd, capo, capm3
|
|
| Bootstrap Provider |
kubeadm (alias kadm), rke2, okd, ck8s
|
|
| Node OS |
ubuntu, suse, na, leapmicro
|
|
| Deployment Options |
light-deploy, dev-sources, ha, misc, maxsurge-0, logging, no-logging, openbao
|
|
| Pipeline Scenarios | Available scenario list and description |
-
🎬 preview☁️ capd🚀 kadm🐧 ubuntu -
🎬 preview☁️ capo🚀 rke2🐧 suse -
🎬 preview☁️ capm3🚀 rke2🐧 ubuntu -
☁️ capd🚀 kadm🛠️ light-deploy🐧 ubuntu -
☁️ capd🚀 rke2🛠️ light-deploy🐧 suse -
☁️ capd🚀 rke2🐧 ubuntu -
☁️ capo🚀 rke2🐧 suse -
☁️ capo🚀 rke2🐧 leapmicro -
☁️ capo🚀 kadm🐧 ubuntu -
☁️ capm3🚀 rke2🎬 simple-update🐧 ubuntu -
☁️ capm3🚀 rke2🎬 simple-update-no-wkld🐧 ubuntu -
☁️ capo🚀 rke2🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capo🚀 kadm🎬 wkld-k8s-upgrade🐧 ubuntu -
☁️ capo🚀 rke2🎬 rolling-update-no-wkld🛠️ ha🐧 suse -
☁️ capo🚀 rke2🎬 sylva-upgrade-from-1.5.x🛠️ ha🐧 ubuntu -
☁️ capo🚀 rke2🎬 sylva-upgrade-from-1.5.x🛠️ ha,misc🐧 ubuntu -
☁️ capo🚀 rke2🛠️ ha,misc🐧 ubuntu -
☁️ capo🚀 rke2🛠️ ha,misc,openbao🐧 suse -
☁️ capm3🚀 rke2🐧 suse -
☁️ capm3🚀 rke2🐧 ubuntu -
☁️ capm3🚀 kadm🐧 ubuntu -
☁️ capm3🚀 ck8s🐧 ubuntu -
☁️ capm3🚀 rke2🎬 rolling-update-no-wkld🛠️ ha🐧 ubuntu -
☁️ capm3🚀 rke2🎬 wkld-k8s-upgrade🛠️ ha🐧 suse -
☁️ capm3🚀 kadm🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capm3🚀 rke2🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capm3🚀 rke2🎬 sylva-upgrade-from-1.5.x🛠️ ha🐧 suse -
☁️ capm3🚀 rke2🛠️ misc,ha🐧 suse -
☁️ capm3🚀 rke2🎬 sylva-upgrade-from-1.5.x🛠️ ha,misc🐧 suse -
☁️ capm3🚀 kadm🎬 rolling-update🛠️ ha🐧 suse -
☁️ capm3🚀 ck8s🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capm3🚀 rke2|okd🎬 no-update🐧 ubuntu|na
Global config for deployment pipelines
- autorun pipelines
- allow failure on pipelines
- record sylvactl events
Notes:
- Enabling
autorunwill make deployment pipelines to be run automatically without human interaction - Disabling
allow failurewill make deployment pipelines mandatory for pipeline success. - if both
autorunandallow failureare disabled, deployment pipelines will need manual triggering but will be blocking the pipeline
Be aware: after configuration change, pipeline is not triggered automatically.
Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.