cluster-machines-ready: more info on missing post-commands-executed annotation

When cluster-machines-ready times out because some Node is missing the post-commands-executed annotation, we need to easily tell which is that node (or nodes).

This MR improves cluster-machines-ready so that it will give this information.

This MR also improves the output so that we get detailed information on Machines and Nodes only if the script did not succeed.

Testing

I simulated failing case with (on a setup where I had removed the post-commands-executed annotation on a Node with kubectl edit):

$ WAIT_TIMEOUT=10m CONTROL_PLANE=rke2controlplane CLUSTER_NAME=management-cluster timeout -v 25s charts/sylva-units/scripts/cluster-machines-ready.sh

result:

======================================================
failure waiting for all Machines and Nodes to be ready

--- summary of resources

-- Control plane:
NAME                               AGE
management-cluster-control-plane   448d
Complété

-- Machines:

NAME                                       CLUSTER              NODENAME                                   PROVIDERID                                          PHASE     AGE     VERSION
management-cluster-control-plane-4tchz     management-cluster   management-cluster-cp-dfac864565-mht2z     openstack:///9ab62912-6540-48ea-92c0-4416b7a991e8   Running   8d      v1.33.7+rke2r1
management-cluster-control-plane-j9c66     management-cluster   management-cluster-cp-dfac864565-f898z     openstack:///b6b165fc-89a2-42e7-91c0-3d310c56fda2   Running   8d      v1.33.7+rke2r1
management-cluster-control-plane-v7vbk     management-cluster   management-cluster-cp-dfac864565-w97bs     openstack:///3efd1f67-df37-4cac-8fd4-31ea37bf31dc   Running   7d22h   v1.33.7+rke2r1
management-cluster-md-ubuntu-bfc4b-6l5mp   management-cluster   management-cluster-md-ubuntu-bfc4b-6l5mp   openstack:///838cffac-1f7c-4e19-be4a-56cc06124b22   Running   6d22h   v1.33.7+rke2r1
management-cluster-md-ubuntu-bfc4b-cwnzk   management-cluster   management-cluster-md-ubuntu-bfc4b-cwnzk   openstack:///98833d47-efce-4af0-a0e4-00f36fdca419   Running   6d22h   v1.33.7+rke2r1
management-cluster-md0-2rwbh-7kgr6         management-cluster   management-cluster-md0-2rwbh-7kgr6         openstack:///79b98ae7-ee16-437d-862b-5ed0ac71e9fc   Running   6d22h   v1.33.7+rke2r1
management-cluster-md0-2rwbh-gmxjd         management-cluster   management-cluster-md0-2rwbh-gmxjd         openstack:///0aac0a67-3da3-443d-ada4-41b4c343cd3f   Running   7d22h   v1.33.7+rke2r1
management-cluster-md0-2rwbh-v4hf6         management-cluster   management-cluster-md0-2rwbh-v4hf6         openstack:///59c6c8cd-09a9-4bf8-a1c4-5d74f6095fea   Running   8d      v1.33.7+rke2r1

-- Nodes:

NAME                                       STATUS   ROLES                       AGE     VERSION          INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION               CONTAINER-RUNTIME
management-cluster-cp-dfac864565-f898z     Ready    control-plane,etcd,master   8d      v1.33.7+rke2r1   172.20.136.234   <none>        openSUSE Leap 15.6   6.4.0-150600.23.81-default   containerd://2.1.5-k3s1
management-cluster-cp-dfac864565-mht2z     Ready    control-plane,etcd,master   8d      v1.33.7+rke2r1   172.20.136.111   <none>        openSUSE Leap 15.6   6.4.0-150600.23.81-default   containerd://2.1.5-k3s1
management-cluster-cp-dfac864565-w97bs     Ready    control-plane,etcd,master   7d22h   v1.33.7+rke2r1   172.20.136.20    <none>        openSUSE Leap 15.6   6.4.0-150600.23.81-default   containerd://2.1.5-k3s1
management-cluster-md-ubuntu-bfc4b-6l5mp   Ready    <none>                      6d22h   v1.33.7+rke2r1   172.20.136.113   <none>        Ubuntu 24.04.3 LTS   6.8.0-90-generic             containerd://2.1.5-k3s1
management-cluster-md-ubuntu-bfc4b-cwnzk   Ready    <none>                      6d22h   v1.33.7+rke2r1   172.20.136.13    <none>        Ubuntu 24.04.3 LTS   6.8.0-90-generic             containerd://2.1.5-k3s1
management-cluster-md0-2rwbh-7kgr6         Ready    <none>                      6d22h   v1.33.7+rke2r1   172.20.136.53    <none>        openSUSE Leap 15.6   6.4.0-150600.23.81-default   containerd://2.1.5-k3s1
management-cluster-md0-2rwbh-gmxjd         Ready    <none>                      7d22h   v1.33.7+rke2r1   172.20.136.79    <none>        openSUSE Leap 15.6   6.4.0-150600.23.81-default   containerd://2.1.5-k3s1
management-cluster-md0-2rwbh-v4hf6         Ready    <none>                      8d      v1.33.7+rke2r1   172.20.136.147   <none>        openSUSE Leap 15.6   6.4.0-150600.23.81-default   containerd://2.1.5-k3s1

-- Some nodes do not have the 'post-commands-executed' annotations
   (ie. they did not successfully reach end of the post commands execution)

management-cluster-cp-dfac864565-f898z

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon Meaning Available values
☁️ Infra Provider capd, capo, capm3
🚀 Bootstrap Provider kubeadm (alias kadm), rke2, okd, ck8s
🐧 Node OS ubuntu, suse, na, leapmicro
🛠️ Deployment Options light-deploy, dev-sources, ha, misc, maxsurge-0, logging, no-logging, cilium
🎬 Pipeline Scenarios Available scenario list and description
🟢 Enabled units Any available units name, by default apply to management and workload cluster. Can be prefixed by mgmt: or wkld: to be applied only to a specific cluster type
🏗️ Target platform Can be used to select specific deployment environment (i.e real-bmh for capm3 )
  • 🎬 preview ☁️ capd 🚀 kadm 🐧 ubuntu

  • 🎬 preview ☁️ capo 🚀 rke2 🐧 suse

  • 🎬 preview ☁️ capm3 🚀 rke2 🐧 ubuntu

  • ☁️ capd 🚀 kadm 🛠️ light-deploy 🐧 ubuntu

  • ☁️ capd 🚀 rke2 🛠️ light-deploy 🐧 suse

  • ☁️ capo 🚀 rke2 🐧 suse

  • ☁️ capo 🚀 rke2 🐧 leapmicro

  • ☁️ capo 🚀 kadm 🐧 ubuntu

  • ☁️ capo 🚀 kadm 🐧 ubuntu 🟢 neuvector,mgmt:harbor

  • ☁️ capo 🚀 rke2 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 kadm 🎬 wkld-k8s-upgrade 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha 🐧 suse

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade-from-1.6.x 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🛠️ ha,misc,openbao🐧 suse

  • ☁️ capo 🚀 rke2 🐧 suse 🎬 upgrade-from-prev-tag

  • ☁️ capm3 🚀 rke2 🐧 suse

  • ☁️ capm3 🚀 kadm 🐧 ubuntu

  • ☁️ capm3 🚀 ck8s 🐧 ubuntu

  • ☁️ capm3 🚀 kadm 🎬 rolling-update-no-wkld 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 wkld-k8s-upgrade 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 upgrade-from-prev-release-branch 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 rke2 🛠️ misc,ha 🐧 suse

  • ☁️ capm3 🚀 rke2 🎬 sylva-upgrade 🛠️ ha,misc 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 ck8s 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2|okd 🎬 no-update 🐧 ubuntu|na

  • ☁️ capm3 🚀 rke2 🐧 suse 🎬 upgrade-from-release-1.5

  • ☁️ capm3 🚀 rke2 🐧 suse 🎬 upgrade-to-main

Global config for deployment pipelines

  • autorun pipelines
  • allow failure on pipelines
  • record sylvactl events

Notes:

  • Enabling autorun will make deployment pipelines to be run automatically without human interaction
  • Disabling allow failure will make deployment pipelines mandatory for pipeline success.
  • if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited by Thomas Morin

Merge request reports

Loading