simplify baremetal OS images settings

What we have today

in sylva-units defaults values.yaml, default_os_images and os_images, that contain this kind of things for configuring the os-image-server unit:

    ubuntu-jammy-plain-rke2-1-26-9:
      uri: "{{ .Values.sylva_base_oci_registry }}/sylva-elements/diskimage-builder/ubuntu-jammy-plain-rke2-1.26.9:0.0.12"
      filename: ubuntu-jammy-plain-rke2-1.26.9.qcow2
      checksum: ffdc81fcdc0104151aa792a508eefe0d47660b18683949edcd734b3a4f938f20
      persistence:
        enabled: true
        size: 3Gi

We can't list all images produced by diskimage builder here, because this would result in os-image downloading all of them, which is useless and too heavy for a given deployment. So we list today a single image here.

Downstream sylva, we end up having to recreate the same data structure for the image we want to use.

Then we have sylva-capi cluster, which will need this kind of config:

  capm3:
    machine_image_url: http://ubuntu-jammy-plain-rke2-1-26-9.os-images.svc.cluster.local:8080/ubuntu-jammy-plain-rke2-1.26.9.qcow2
    machine_image_format: qcow2
    machine_image_checksum: http://ubuntu-jammy-plain-rke2-1-26-9.os-images.svc.cluster.local:8080/ubuntu-jammy-plain-rke2-1.26.9.qcow2.sha256sum
    machine_image_checksum_type: sha256

So today, we have lots of grunt work and repetitive things to write when we want to use a given image for a given deployment:

have sylva-core use images from diskimage-builder (see #636 (closed)), at least if we keep wanting to have this kind of content for os_images (which I think we in fact don't need, see below)
get the right settings in `cluster.capm3.machine
have a way of ensuring that we don't deploy a k8s_version: X with an image built for Kubernetes version Y (we maybe don't always want to do that, we probably would like to be able to enforce this sometimes)

Note that we also still don't leverage one interesting thing in Metal3: we can give the image checksum directly to the IPA, instead of giving it an URL to find it. Giving the checksum directly would be one step towards securing against MITM attacks where the attacker would intercept the HTTP session in which the OS images is downloaded to insert malicious content.

Here is what I propose: A) stop our habit of giving the image checksum in os-image-server values (os_images) when we use an OCI artifact

registries have their own way of avoiding dataplane corruption
if we care about provenance validation, we need to complete the implementation of OCI artifact signing and signature verification
if we want the os-image-server downloader to not have to spend a few tens of seconds computing the SHA256, we can let it fetch it from the OCI artifact annotations B) simplify file names: we don't car what the actual filename is, we only need it to be the same one produced in os-image-server Ingress and the one used in sylva-capi-cluster URL (machine_image_url), we can standardize on image.qcow2 everywhere by default and things will just work C) how to let renovate bot update references in sylva-units to diskimage-builder artifacts:
have sylva-units refer once to a given diskimage-builder release

sylva_diskimagebuilder_version: 0.1.1

have a datastructure in sylva-units list all artifacts basenames: diskimage_builder_os_images

diskimage_builder_os_images:
  ubuntu-jammy-plain-rke2-1.26.9: {}
  ubuntu-jammy-plain-rke2-1.25.15: {}
  opensuse-15.5-plain-rke2-1.26.9: {}
  ubuntu-jammy-hardened-rke2-1.26.9: {}
  ubuntu-jammy-plain-kubeadm-1.26.9: {}
  ubuntu-jammy-plain-kubeadm-1.26.9: {}

from this dict, generate os_images in sylva-units with templating, deriving the key from what precedes the : and building the value from the rest (using sylva_diskimagebuilder_version to build the full OCI URL), without defining any filename nor checksum
let's go further and assume that an entry in os_images is added only for keys for which there is an enabled: true field defined: this will allow the users to select which images are prepared by os-image-server
now let's see how to simplify s-c-c capm3 config ...
- have the os-image-server downloader tool build a configmap containing a dict like this:

os_images_info:
   ubuntu-jammy-plain-rke2-1.26.9:
     url: http://ubuntu-jammy-plain-rke2-1-26-9.os-images.cluster.local:8080/ubuntu-jammy-plain-rke2-1-26-9/image.qcow2
     checksum: <checksum retrieved from the OCI artifact>  
     checksumType: ...
     format: qcow2
   ubuntu-jammy-plain-rke2-1.25.15: {}
   opensuse-15.5-plain-rke2-1.26.9: {}
   ubuntu-jammy-hardened-rke2-1.26.9: {}
   ubuntu-jammy-plain-kubeadm-1.26.9: {}
   ubuntu-jammy-plain-kubeadm-1.26.9: {}

this dict would be passed to sylva-capi-cluster HelmRelease with valuesFrom
sylva-capi-cluster would accept a new os_image_key key under capm3

cluster:
  capm3:
    os_image_key: ubuntu-jammy-plain-rke2-1.26.9

when this syntax is used, sylva-capi-cluster would build the Metal3MachineTemplate.spec.template.spec.image fields from os_images_info.$os_image_key .

End result

In sylva-units values.yaml we would only have this:

 diskimage_builder_os_images:
   ubuntu-jammy-plain-rke2-1.26.9: {}
   ubuntu-jammy-plain-rke2-1.25.15: {}
   opensuse-15.5-plain-rke2-1.26.9: {}
   ubuntu-jammy-hardened-rke2-1.26.9: {}
   ubuntu-jammy-plain-kubeadm-1.26.9: {}
   ubuntu-jammy-plain-kubeadm-1.26.9: {}

Renovate bot would update it when a new diskimage-builder is tagged

For a given deployment people would have to:

specify which image they want os-image-builder to support

 diskimage_builder_os_images:
   opensuse-15.5-plain-rke2-1.26.9:
     enabled: true

They could even parametrize this:

 diskimage_builder_os_images:
   opensuse-15.5-plain-rke2-{{ .Values.cluster.k8s_version }}:
     enabled: true

for a given cluster (mgmt or workload cluster), we would only have this kind of things:

cluster:
  capm3:
    os_image_key: ubuntu-jammy-plain-rke2-1.26.9  ## again, {{ k8s_version }} could be used here
  control_plane: 
    capm3:
      os_image_key: ubuntu-jammy-hardened-rke2-1.26.9  ## example if a different image is wanted for the CP

This issue share similarities with #528 (closed)

Edited Jan 19, 2024 by Thomas Morin

simplify baremetal OS images settings

What we have today

End result

Related