Added test for detecting self cluster unplanned node rolling updates

What does this MR do and why?

Summary

Add detection for unplanned Machine rolling updates that occur when a cluster's own update (management or workload) triggers unwanted Machine recreation due to configuration changes.

Problem

Current detection only covers cross-cluster scenarios (!4705 (merged)) but misses self-cluster rolling updates. When updating a cluster, changes to metallb, calico, or coredns can trigger unplanned Machine rolling updates within the same cluster. The existing cluster-machines-ready reconciliation timestamp approach doesn't work for self-updates because if unwanted Machines are created, the reconciliation waits for them and completes after they exist, making the timestamp useless as a reference.

Solution Implement

The solution uses a timestamp comparison approach to detect unplanned machine rolling updates during cluster updates.

  1. Capture Update Start Time of Management and Workload Cluster
  2. Compare Machine Creation Times with the upgrade jobs
    • Machine creationTimestamp > UPGRADE_STARTED_AT → New machine (unplanned rolling update)
    • Machine creationTimestamp ≤ UPGRADE_STARTED_AT → Existing machine (expected)

To achieve this:

  1. Created Reusable Base Templates
  • .get-upgrade-job-timestamp::

    • Extracts upgrade job start timestamp via GitLab API
    • Uses UPGRADE_JOB_NAME variable for flexibility
    • Exports UPGRADE_STARTED_AT for downstream use
  • .check-rolling-updates-base:

    • Common script logic for checking machine rolling updates
    • Dynamic namespace handling: sylva-system for management cluster, $ENV_NAME for workload cluster
    • Detects machines created after upgrade started

2 New Jobs Added

  • mgmt-detect-self-rolling-updates
  • wkld-detect-self-rolling-updates

Passed Pipeline:

  1. When rolling not expected(simple upgrade):
  2. When rolling is expected (default is to skip the test) , just to make sure if test is failing as expected:

Test coverage

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon Meaning Available values
☁️ Infra Provider capd, capo, capm3
🚀 Bootstrap Provider kubeadm (alias kadm), rke2, okd, ck8s
🐧 Node OS ubuntu, suse, na, leapmicro
🛠️ Deployment Options light-deploy, dev-sources, ha, misc, maxsurge-0, logging, no-logging, openbao
🎬 Pipeline Scenarios Available scenario list and description
  • 🎬 preview ☁️ capd 🚀 kadm 🐧 ubuntu

  • 🎬 preview ☁️ capo 🚀 rke2 🐧 suse

  • 🎬 preview ☁️ capm3 🚀 rke2 🐧 ubuntu

  • ☁️ capd 🚀 kadm 🛠️ light-deploy 🐧 ubuntu

  • ☁️ capd 🚀 rke2 🛠️ light-deploy 🐧 suse

  • ☁️ capd 🚀 rke2 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🐧 suse

  • ☁️ capo 🚀 rke2 🐧 leapmicro

  • ☁️ capo 🚀 kadm 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 simple-update 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 simple-update-no-wkld 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 kadm 🎬 wkld-k8s-upgrade 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha 🐧 suse

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade-from-1.5.x 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade-from-1.5.x 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🛠️ ha,misc,openbao🐧 suse

  • ☁️ capm3 🚀 rke2 🐧 suse

  • ☁️ capm3 🚀 rke2 🐧 ubuntu

  • ☁️ capm3 🚀 kadm 🐧 ubuntu

  • ☁️ capm3 🚀 ck8s 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 wkld-k8s-upgrade 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 sylva-upgrade-from-1.5.x 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 rke2 🛠️ misc,ha 🐧 suse

  • ☁️ capm3 🚀 rke2 🎬 sylva-upgrade-from-1.5.x 🛠️ ha,misc 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 ck8s 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2|okd 🎬 no-update 🐧 ubuntu|na

Global config for deployment pipelines

  • autorun pipelines
  • allow failure on pipelines
  • record sylvactl events

Notes:

  • Enabling autorun will make deployment pipelines to be run automatically without human interaction
  • Disabling allow failure will make deployment pipelines mandatory for pipeline success.
  • if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited by Nitin Sharma

Merge request reports

Loading