Add unit to backup CAPI resources of all clusters

What does this MR do and why?

This Merge Request add

  • a unit_templates backup-s3 to configure the target s3 bucket used to store backups (it will be used by other backup mechanisms)
  • a unit backup-capi-resources which backup CAPI resources of all clusters

It's the first MR of a batch of two MRs:

All clusters are backed up in one go on the management cluster, stored in one file per namespace.

clusterctl move cannot move only one cluster if several are in the same namespace: all the CAPI resources are backed up per namespace.

It is using kube-cronjob to periodically perform this backup. Backups are not versioned for now but a timestamp can be added to the name of files to avoid overwriting previous backup if the target s3 bucket do not support versioning.

The backup operation consists in:

  • build a tar.gz from clusterctl move to save the cluster configuration (plus list of resources provided as parameter... ConfigMap/capo-cluster-resources for now)
  • save this tar.gz to a s3 bucket
    • the s3 bucket configuration is provided using specific configuration at the root of values.yaml
      backup:
        store:
          timestamped: false
          s3:
            host: <s3-host>
            accessKey: <ak>
            secretKey: <sk>
            bucket: <bucket name>
            cert: <s3-host certificate if relevant>
  • if the pushgateway unit is enabled, backup results are sent as metrics to prometheus using the pushgateway.

Related reference(s)

fix #1784 (closed)

Test coverage

Manual for now on management as workload cluster, providing following logs:

-- Set kubectl configuration.
Cluster "internal" set.
User "user" set.
Context "internal" created.
Switched to context "internal".
List of namespaces to backup : my-rke2-capo-workload sylva-system

-- Start backing up clusters from namespace 'my-rke2-capo-workload'.
Moving to directory...
Discovering Cluster API objects
Starting move of Cluster API objects Clusters=1
Moving Cluster API objects ClusterClasses=0
Saving files to /tmp/tmp.LkkniL/my-rke2-capo-workload_capi_resources_backup
-- Clusters backed up.
-- Backup compressed.
Added `backup` successfully.
`/tmp/tmp.BHCgAN` -> `backup/sylva-backup/my-rke2-capo-workload_capi_resources_backup.tar.gz`
Total: 62.97 KiB, Transferred: 62.97 KiB, Speed: 3.30 MiB/s
-- Backup uploaded
Backup succeeded in 123 seconds
-- Push result to the pushgateway

-- Start backing up clusters from namespace 'sylva-system'.
Moving to directory...
Discovering Cluster API objects
Starting move of Cluster API objects Clusters=1
Moving Cluster API objects ClusterClasses=0
Saving files to /tmp/tmp.OMaIbg/sylva-system_capi_resources_backup
-- Clusters backed up.
-- Backup compressed.
Added `backup` successfully.
`/tmp/tmp.dcMpgL` -> `backup/sylva-backup/sylva-system_capi_resources_backup.tar.gz`
Total: 72.93 KiB, Transferred: 72.93 KiB, Speed: 3.97 MiB/s
-- Backup uploaded
Backup succeeded in 91 seconds
-- Push result to the pushgateway

Backup summary:
    2 Succeeded: my-rke2-capo-workload sylva-system
    0 Failed   :

If prometheus is deployed, the folowing metrics are available :

image

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon Meaning Available values
☁️ Infra Provider capd, capo, capm3
🚀 Bootstrap Provider kubeadm (alias kadm), rke2
🐧 Node OS ubuntu, suse
🛠️ Deployment Options light-deploy, oci, ha, misc
🎬 Pipeline Scenarios no-wkld simple-update simple-update-no-wkld rolling-update rolling-update-no-wkld wkld-k8s-upgrade nightly sylva-upgrade sylva-upgrade-no-wkld sylva-upgrade-from-x.x.x preview
  • 🎬 preview ☁️ capd 🚀 kadm 🐧 ubuntu 🛠️ oci

  • 🎬 preview ☁️ capo 🚀 rke2 🐧 suse

  • 🎬 preview ☁️ capm3 🚀 rke2 🐧 ubuntu

  • ☁️ capd 🚀 kadm 🛠️ light-deploy 🐧 ubuntu

  • ☁️ capd 🚀 rke2 🛠️ oci,light-deploy 🐧 suse

  • ☁️ capo 🚀 rke2 🛠️ oci 🐧 suse

  • ☁️ capo 🚀 kadm 🛠️ oci 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 kadm 🎬 wkld-k8s-upgrade 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha,misc 🐧 suse

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade-from-1.3.x 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🐧 suse

  • ☁️ capm3 🚀 kadm 🛠️ oci 🐧 ubuntu

  • ☁️ capm3 🚀 kadm 🎬 rolling-update-no-wkld 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 wkld-k8s-upgrade 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 sylva-upgrade-from-1.3.x 🛠️ misc,ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 suse

Global config for deployment pipelines

  • autorun pipelines
  • allow failure on pipelines

Notes:

  • Enabling autorun will make deployment pipelines to be run automatically without human interaction
  • Disabling allow failure will make deployment pipelines mandatory for pipeline success.
  • if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited by Arnaud Bouts

Merge request reports

Loading