baremetalHosts powered off without reasons during rolling updates
During rolling updates, some machines remain stuck on provisionning, as in this CI run: https://gitlab.com/sylva-projects/sylva-core/-/jobs/7852801460
v1.27.16
sylva-system mgmt-1458191268-kubeadm-capm3-virt-control-plane-dblfv mgmt-1458191268-kubeadm-capm3-virt Provisioning 54m v1.28.12
sylva-system mgmt-1458191268-kubeadm-capm3-virt-control-plane-rt8bw mgmt-1458191268-kubeadm-capm3-virt mgmt-1458191268-kubeadm-capm3-virt-management-cp-2 metal3://sylva-system/mgmt-1458191268-kubeadm-capm3-virt-management-cp-2/mgmt-1458191268-kubeadm-capm3-virt-control-plane-rt8bw Running 92m v1.28.12
sylva-system mgmt-1458191268-kubeadm-capm3-virt-control-plane-s7qbm mgmt-1458191268-kubeadm-capm3-virt mgmt-1458191268-kubeadm-capm3-virt-management-cp-0 metal3://sylva-system/mgmt-1458191268-kubeadm-capm3-virt-management-cp-0/mgmt-1458191268-kubeadm-capm3-virt-control-plane-s7qbm Running 92m v1.28.12
sylva-system mgmt-1458191268-kubeadm-capm3-virt-md0-jbzpl-q56fg mgmt-1458191268-kubeadm-capm3-virt mgmt-1458191268-kubeadm-capm3-virt-management-md-0 metal3://sylva-system/mgmt-1458191268-kubeadm-capm3-virt-management-md-0/mgmt-1458191268-kubeadm-capm3-virt-md0-jbzpl-q56fg Running 56m v1.28.12
The BMH is in provisioning state, with online: true:
operationHistory:
deprovision:
end: "2024-09-18T08:49:48Z"
start: "2024-09-18T08:49:37Z"
inspect:
end: "2024-09-18T07:51:40Z"
start: "2024-09-18T07:46:51Z"
provision:
end: null
start: "2024-09-18T08:57:52Z"
register:
end: "2024-09-18T08:49:52Z"
start: "2024-09-18T08:49:51Z"
But we see that machine has been powered off in libvirt-console logs:
[2024-09-18 08:58:35,610] INFO in main: System "c0014001-b10b-f001-c0de-feeb1e54ee15" power state set to "ForceOff"
While looking at ironic logs in the bootstrap cluster, we can see that it has powered off the VM:
2024-09-18 08:58:35.183 1 DEBUG sushy.connector [None req-4fd42ab1-69bd-4f90-a788-f73ed40ee46c - - - - - -] HTTP request: POST https://172.18.0.2:8001/redfish/v1/Systems/c0014001-b10b-f001-c0de-feeb1e54ee15/Actions/ComputerSystem.Reset; headers: {'Content-Type': 'application/json', 'OData-Version': '4.0'}; body: {'ResetType': 'ForceOff'}; blocking: False; timeout: 60; session arguments: {}; _op /usr/lib/python3.11/site-packages/sushy/connector.py:149[00m
2024-09-18 08:58:35.611 1 DEBUG sushy.connector [None req-4fd42ab1-69bd-4f90-a788-f73ed40ee46c - - - - - -] HTTP response for POST https://172.18.0.2:8001/redfish/v1/Systems/c0014001-b10b-f001-c0de-feeb1e54ee15/Actions/ComputerSystem.Reset: status code: 204 _op /usr/lib/python3.11/site-packages/sushy/connector.py:283[00m
This is unexpected as there are no more BMH defined in bootstrap cluster after pivoting, there is probably some bug in metal3. In the meantime we should unistall ironic from bootstrap after pivoting.
Edited by Francois Eleouet