libvirt-metal BareMetalHost fails to provision when upgrading from Sylva 1.5

in https://gitlab.com/sylva-projects/sylva-core/-/jobs/12078051754 we see a problem that also exists on main when upgrading a libvirt-metal deployment from Sylva 1.5

the high level symptom was cluster unit failing to progress, because one Machine remains with "associated node not found"

the root cause can be seen in the BareMetalHost resource status:

  status:
    errorCount: 1
    errorMessage: 'Image provisioning failed: Deploy step deploy.write_image failed
      on node 9e3d5a91-b15f-41d9-95c3-2173316ece50. No suitable device was found for
      deployment using these hints {''name'': ''s== /dev/sda''}'
    errorType: provisioning error

This is due to !4947 (merged), which changed the rootDeviceHints.deviceName from vda to sda to match the new libvirt-metal hardware profile... which works on main but is invalid when the hardware was deployed with the previous "hardware profile" which is used in Sylva 1.5, and which is the hardware profile of the servers in CI scenarios when we upgrade from Sylva 1.5.


in some other jobs (https://gitlab.com/sylva-projects/sylva-core/-/jobs/12082946558 in main nightly runs, we have the same root issue, although the high level symptom is slightly different (Machine stuck on "WaitingForNodeRef") and this time the error is only visible in the libvirt-metal VM console (bootstrap-cluster-dump/sylva-system/libvirt-metal-management-cp-1-0/logs.txt) and as events on BareMetalHosts resources

Edited Nov 14, 2025 by Thomas Morin
Assignee Loading
Time tracking Loading