capm3 - unwanted node rolling update on workload clusters during mgmt cluster update
In some circumstances an unwanted node rolling update will be triggered on capm3 workload clusters when the OS images served by the mgmt cluster are changed, under some conditions (see below).
This issue is common to Sylva 1.3 and Sylva "pre-1.4" main.
Typical summary example:
- workload clusters use a specific image (
ubuntu-noble-hardened-rke2-1-30-9or an OS image selector) - the mgmt cluster cluster is updated and for the same image key, a newer image is now provided (no change of Kubernetes patch version, only base OS changes)
Looking more closely, the conditions are not exactly the same for 1.3 and main:
-
what is common:
- an "image key" or "OS image selector" do not directly indicate which OS image is deployed (they do not include diskimage builder information), it's only a "loose" indication
- information about which OS images are served by the mgmt cluster is taken into account to determine the exact image to use (it's SHA sum) from the "loose" indication formulated by the image key or OS image selector
-
in Sylva 1.3 the problematic scenario is the following:
- a workload cluster has
image_key: foo - the mgmt cluster is updated and the OS image for
foostill exist in the newer settings, but with a different content- this can arise for instance if diskimagebuilder version was incremented to incorporate base OS changes without a change of Kubernetes version
- if the Kubernetes version changes, given how we typically choose image keys (e.g.
ubuntu-jammy-plain-rke2-1-29-13) the image key will differ and the scenario here does not apply
- when the os-images-info unit of the sylva-units mgmt cluster Helm release is reconciled, it will produce an updated os-images-info ConfigMap
- this ConfigMap is copied at once in workload cluster namespaces (by a Kyverno policy) under the
kyverno-cloned-os-images-info-capm3name - this
kyverno-cloned-os-images-info-capm3ConfigMap is used as input to theclusterunit (valuesFrom of the HelmRelease of sylva-capi-cluster) - at the next periodic reconciliation of the
clusterHelmRelease the new content of thekyverno-cloned-os-images-info-capm3will be used, resulting in the creation of new Metal3MachineTemplates pointing to the new image (URL unchanged, typically, but SHA256 sum will be changed) - this will trigger a node rolling update
- a workload cluster has
-
in Sylva
mainthe problematic scenario is:- a workload cluster has a given OS image selector for its cluster
- it matches a given image X
- the mgmt cluster is updated and similarly as above, there's a newer image for key X (e.g. X was
ubuntu-noble-hardened-rke2-1-31-5and after the update, the image forubuntu-noble-hardened-rke2-1-31-5is still here but it has base OS updates) or a different OS image Y now matches the OS selector (I don't think we have a practical case that would do this today, but given the OS image selector framework flexibility, it could occur) - the new image X and/or the new image Y will be served by os-image-server
- os-image-server will produce an updated
capm3-os-image-server-os-images-infoConfigMap (os-image-servernamespace) - this ConfigMap will be cloned as
kyverno-cloned-os-images-info-capm3ConfigMap in all workload cluster namespaces - at the next reconciliation of the
clusterHelmRelease the new information will be used, trigerring an update of Metal3Machine template (with either a different image Y with a fully different URL and checksum, or a newer image X with a new sha sum and URL with only the included sylva diskimage-builder version changing) - a node rolling update is triggered
This issue is a question of lifecycle "coupling" between mgmt and workload clusters.
The solution seems to be to decorrelate two things:
- producing os-images-info for relevant images based only on workload cluster data (similarly as what we do for openstack)
- having the information in a workload cluster context about which OS images are served by the mgmt cluster (unambiguously identified by the sha sum, since the "image key" is ambiguous)
/cc @feleouet @cristian.manda @mihai.zaharia @mederic.deverdilhac @rletrocquer