kubevirt-test-vms unit stuck for ~1 hour and then fails but the job succeeds afterwards
Upgrade scenario from 1.3 to main.
I faced a issue where my update-management-cluster job was stuck at below step:
2025/05/22 09:45:40.944638 Kustomization/kubevirt-test-vms state changed: HealthCheckFailed - health check failed after 30.020228958s: timeout waiting for: [VirtualMachine/kubevirt-tests/cirros-vm status: 'InProgress']
It stays at this step around ~1 hour and then the unit fails saying:
2025/05/22 09:45:40.944638 Kustomization/kubevirt-test-vms state changed: HealthCheckFailed - health check failed after 30.020228958s: timeout waiting for: [VirtualMachine/kubevirt-tests/cirros-vm status: 'InProgress']
2025/05/22 09:46:38.372277 Command timeout exceeded
Timed-out waiting for the following resources to be ready:
IDENTIFIER STATUS REASON MESSAGE
Kustomization/sylva-system/kubevirt-test-vms InProgress Kustomization generation is 3, but latest observed generation is 2
╰┄╴VirtualMachine/kubevirt-tests/cirros-vm InProgress VirtualMachine generation is 2, but latest observed generation is 1
╰┄╴┬┄┄[Conditions]
├┄╴Ready True
├┄╴LiveMigratable True
╰┄╴RestartRequired True a non-live-updatable field was changed in the template spec
and after this failure, the job succeeds :
2025/05/22 09:45:40.944638 Kustomization/kubevirt-test-vms state changed: HealthCheckFailed - health check failed after 30.020228958s: timeout waiting for: [VirtualMachine/kubevirt-tests/cirros-vm status: 'InProgress']
2025/05/22 09:46:38.372277 Command timeout exceeded
Timed-out waiting for the following resources to be ready:
IDENTIFIER STATUS REASON MESSAGE
Kustomization/sylva-system/kubevirt-test-vms InProgress Kustomization generation is 3, but latest observed generation is 2
╰┄╴VirtualMachine/kubevirt-tests/cirros-vm InProgress VirtualMachine generation is 2, but latest observed generation is 1
╰┄╴┬┄┄[Conditions]
├┄╴Ready True
├┄╴LiveMigratable True
╰┄╴RestartRequired True a non-live-updatable field was changed in the template spec
✔ Sylva is ready, everything deployed in management cluster
00:01
Management cluster nodes:
NAME STATUS ROLES AGE VERSION
mgmt-1830539845-rke2-capo-cp-682135b673-6ghxz Ready control-plane,etcd,master 130m v1.31.7+rke2r1
mgmt-1830539845-rke2-capo-cp-682135b673-bzf6f Ready control-plane,etcd,master 136m v1.31.7+rke2r1
mgmt-1830539845-rke2-capo-cp-682135b673-zkxr5 Ready control-plane,etcd,master 144m v1.31.7+rke2r1
mgmt-1830539845-rke2-capo-md0-pd254-64qsn Ready <none> 123m v1.31.7+rke2r1
mgmt-1830539845-rke2-capo-md0-pd254-kkhpb Ready <none> 115m v1.31.7+rke2r1
mgmt-1830539845-rke2-capo-md0-pd254-p7k5q Ready <none> 119m v1.31.7+rke2r1
🎉 All done
job link: https://gitlab.com/sylva-projects/sylva-core/-/jobs/10113967042#L2939