kubevirt-test-vms unit stuck for ~1 hour and then fails but the job succeeds afterwards

Upgrade scenario from 1.3 to main.

I faced a issue where my update-management-cluster job was stuck at below step:

2025/05/22 09:45:40.944638 Kustomization/kubevirt-test-vms state changed: HealthCheckFailed - health check failed after 30.020228958s: timeout waiting for: [VirtualMachine/kubevirt-tests/cirros-vm status: 'InProgress']

It stays at this step around ~1 hour and then the unit fails saying:

2025/05/22 09:45:40.944638 Kustomization/kubevirt-test-vms state changed: HealthCheckFailed - health check failed after 30.020228958s: timeout waiting for: [VirtualMachine/kubevirt-tests/cirros-vm status: 'InProgress']
2025/05/22 09:46:38.372277 Command timeout exceeded
Timed-out waiting for the following resources to be ready:
IDENTIFIER                                   STATUS     REASON MESSAGE
Kustomization/sylva-system/kubevirt-test-vms InProgress        Kustomization generation is 3, but latest observed generation is 2
╰┄╴VirtualMachine/kubevirt-tests/cirros-vm   InProgress        VirtualMachine generation is 2, but latest observed generation is 1
   ╰┄╴┬┄┄[Conditions]
      ├┄╴Ready                               True
      ├┄╴LiveMigratable                      True
      ╰┄╴RestartRequired                     True              a non-live-updatable field was changed in the template spec

and after this failure, the job succeeds :

2025/05/22 09:45:40.944638 Kustomization/kubevirt-test-vms state changed: HealthCheckFailed - health check failed after 30.020228958s: timeout waiting for: [VirtualMachine/kubevirt-tests/cirros-vm status: 'InProgress']
2025/05/22 09:46:38.372277 Command timeout exceeded
Timed-out waiting for the following resources to be ready:
IDENTIFIER                                   STATUS     REASON MESSAGE
Kustomization/sylva-system/kubevirt-test-vms InProgress        Kustomization generation is 3, but latest observed generation is 2
╰┄╴VirtualMachine/kubevirt-tests/cirros-vm   InProgress        VirtualMachine generation is 2, but latest observed generation is 1
   ╰┄╴┬┄┄[Conditions]
      ├┄╴Ready                               True
      ├┄╴LiveMigratable                      True
      ╰┄╴RestartRequired                     True              a non-live-updatable field was changed in the template spec
✔ Sylva is ready, everything deployed in management cluster
00:01
   Management cluster nodes:
NAME                                            STATUS   ROLES                       AGE    VERSION
mgmt-1830539845-rke2-capo-cp-682135b673-6ghxz   Ready    control-plane,etcd,master   130m   v1.31.7+rke2r1
mgmt-1830539845-rke2-capo-cp-682135b673-bzf6f   Ready    control-plane,etcd,master   136m   v1.31.7+rke2r1
mgmt-1830539845-rke2-capo-cp-682135b673-zkxr5   Ready    control-plane,etcd,master   144m   v1.31.7+rke2r1
mgmt-1830539845-rke2-capo-md0-pd254-64qsn       Ready    <none>                      123m   v1.31.7+rke2r1
mgmt-1830539845-rke2-capo-md0-pd254-kkhpb       Ready    <none>                      115m   v1.31.7+rke2r1
mgmt-1830539845-rke2-capo-md0-pd254-p7k5q       Ready    <none>                      119m   v1.31.7+rke2r1
🎉 All done

job link: https://gitlab.com/sylva-projects/sylva-core/-/jobs/10113967042#L2939

cc @marc.bailly1 @satyawanj

Assignee Loading
Time tracking Loading