sylvactl stops waiting, wrongly considering that a non-fatal error is fatal ("nothing in os_images for image_key...")
update-workload-cluster jobs are currently failing on:
HelmRelease/kubeadm-capm3-virt/cluster InProgress HelmRelease generation is 2, but latest observed generation is 1
╰┄╴┬┄┄[Conditions]
├┄╴Reconciling True Progressing Running 'upgrade' action with timeout of 5m0s
├┄╴Ready False UpgradeFailed Helm upgrade failed for release kubeadm-capm3-virt/cluster with chart sylva-capi-cluster@0.0.0+172dac5dd71b: execution error at (sylva-capi-cluster/templates/workers.yaml:78:146): .capm3/.machine_deployment_default.capm3/.machine_deployments.md0.capm3: nothing in os_images for image_key opensuse-15-6-plain-kubeadm-1-28-14
╰┄╴Released False UpgradeFailed Helm upgrade failed for release kubeadm-capm3-virt/cluster with chart sylva-capi-cluster@0.0.0+172dac5dd71b: execution error at (sylva-capi-cluster/templates/workers.yaml:78:146): .capm3/.machine_deployment_default.capm3/.machine_deployments.md0.capm3: nothing in os_images for image_key opensuse-15-6-plain-kubeadm-1-28-14
This is because, on an update of workload cluster, on capm3, the starting point (before applying new state) is a state in which the cluster HelmRelease is in error, because it's reading the os-images-info ConfigMap to find info on the image used for the initial deployment (here opensuse-15-6-plain-kubeadm-1-28-14), but can't find it anymore because the mgmt cluster has been updated and does not serve it anymore (which explains why the opensuse-15-6-plain-kubeadm-1-28-14 isn't present in the configmap anymore, this configmap being refreshed automatically in all workload cluster namespaces to reflect what the mgmt cluster actually serves).
This isn't a new behavior. What is new is that since the recent upgrade of sylvactl it now observes the status of HelmReleases, and considers this transient error state as a fatal error.
sylvactl should here not conclude that this error is fatal, and should keep waiting.