The Image.url refers to the bootstrapCluster on the mgmt's metal3machineTemplate after the pivot phase
In the context of CAPM3, when the management cluster is installed, the BMHs will fetch the OS image from the bootstrap cluster. This URI is made available to the BMHs via the metal3MachineTemplate resource :
nodeReuse: true
template:
spec:
dataTemplate:
name: management-cluster-cp-metadata-7ef7b3e4bb
hostSelector:
matchLabels:
cluster-role: control-plane
host-type: generic
image:
checksum: ba75b787504a7adf937a568d8d6927e9c1b4d70a47da1bbd7cd8fbbf86a9789e
checksumType: sha256
format: raw
url: http://10.10.10.17/opensuse-15-6-hardened-rke2-1-29-9.raw <<<<<< bootstrap cluster IP
Once the BMHs and the machines have been properly provisioned, the pivot phase and the destruction of the bootstrap cluster follow. During the pivot phase, several resources, including the metal3MachineTemplates, are "pivoted/copied" to the management cluster (except for the metal3MachineTemplate-cp, see issue #1839 (closed) ). However, these templates still contain the URI referencing the image available on the bootstrap cluster, which no longer exists.
Reinstantiating the cluster Helm release on the management cluster does not resolve the issue, as drift detection is not enabled.
As a result, if a user wishes to perform a rolling upgrade (to change a Kubelet parameter for example) without changing the OS image itself, the xMachineTemplates resources will remain the same, and the rolling upgrade cannot occur because the OS image referenced in the templates is no longer accessible.
(We don't see this problem in CI, since the bootstrap cluster is not deleted in capm3-libvirt.)
There are several solutions:
- Patch metal3machineTemplate resources via a Kyverno policy on mgmtClusters
- Enable driftDetection on the release cluster (already started via !2445 (merged) but seems too ambitious before release)
- Remove the ownerReference before the pivot phase (so resources won't be pivoted and will be recreated with the same name but with correct URIs via instantiation of the helmRelease cluster on the mgmtCluster) (==We introduce one bug to fix another)
- Remove metal3MachineTemplates before the first instanciation of cluster HelmRelease ?