sylvactl stops waiting, wrongly considering that a non-fatal error is fatal ("nothing in os_images for image_key...")

update-workload-cluster jobs are currently failing on:

HelmRelease/kubeadm-capm3-virt/cluster               InProgress                    HelmRelease generation is 2, but latest observed generation is 1
╰┄╴┬┄┄[Conditions]
   ├┄╴Reconciling                                    True       Progressing        Running 'upgrade' action with timeout of 5m0s
   ├┄╴Ready                                          False      UpgradeFailed      Helm upgrade failed for release kubeadm-capm3-virt/cluster with chart sylva-capi-cluster@0.0.0+172dac5dd71b: execution error at (sylva-capi-cluster/templates/workers.yaml:78:146): .capm3/.machine_deployment_default.capm3/.machine_deployments.md0.capm3: nothing in os_images for image_key opensuse-15-6-plain-kubeadm-1-28-14
   ╰┄╴Released                                       False      UpgradeFailed      Helm upgrade failed for release kubeadm-capm3-virt/cluster with chart sylva-capi-cluster@0.0.0+172dac5dd71b: execution error at (sylva-capi-cluster/templates/workers.yaml:78:146): .capm3/.machine_deployment_default.capm3/.machine_deployments.md0.capm3: nothing in os_images for image_key opensuse-15-6-plain-kubeadm-1-28-14

This is because, on an update of workload cluster, on capm3, the starting point (before applying new state) is a state in which the cluster HelmRelease is in error, because it's reading the os-images-info ConfigMap to find info on the image used for the initial deployment (here opensuse-15-6-plain-kubeadm-1-28-14), but can't find it anymore because the mgmt cluster has been updated and does not serve it anymore (which explains why the opensuse-15-6-plain-kubeadm-1-28-14 isn't present in the configmap anymore, this configmap being refreshed automatically in all workload cluster namespaces to reflect what the mgmt cluster actually serves).

This isn't a new behavior. What is new is that since the recent upgrade of sylvactl it now observes the status of HelmReleases, and considers this transient error state as a fatal error.

sylvactl should here not conclude that this error is fatal, and should keep waiting.

Assignee Loading
Time tracking Loading