CI: add a test to detect unplanned node rolling updates

We observed cases where on the upgrade of a CAPI provider controller a node rolling update is triggered on all clusters (mgmt cluster and workload clusters).

We would need a CI job to catch such cases.

The test could consist in checking after the update of mgmt cluster, before updating the workload cluster, that there is no Machine in the workload cluster namespace that would have been created after the deployment (ie.g having their creationTimestamp set to a date after the end of the deployment of the workload cluster - checking the most recent sylvactl/reconcileCompletedAt.xxxx annotation of the sylva-units-status cluster-machines-ready` Kustomization).

Edit (@tmmorin 2025-06-19):

To take into account the work done to avoid node rolling updated on change of settings of metallb, metallb-resources, calico and coredns, we would need this CI test to be done with values that will change those settings.

/cc @feleouet

Edited Jun 19, 2025 by Thomas Morin
Assignee Loading
Time tracking Loading