Add unit to backup CAPI resources of all clusters (!3813) · Merge requests · Sylva-projects / sylva-core

What does this MR do and why?

This Merge Request add

a unit_templates backup-s3 to configure the target s3 bucket used to store backups (it will be used by other backup mechanisms)
a unit backup-capi-resources which backup CAPI resources of all clusters

It's the first MR of a batch of two MRs:

!3813 (merged) which add backup-capi-resources unit
!4288 (merged) which add backup-etcd unit

All clusters are backed up in one go on the management cluster, stored in one file per namespace.

clusterctl move cannot move only one cluster if several are in the same namespace: all the CAPI resources are backed up per namespace.

It is using kube-cronjob to periodically perform this backup. Backups are not versioned for now but a timestamp can be added to the name of files to avoid overwriting previous backup if the target s3 bucket do not support versioning.

The backup operation consists in:

build a tar.gz from clusterctl move to save the cluster configuration (plus list of resources provided as parameter... ConfigMap/capo-cluster-resources for now)

save this tar.gz to a s3 bucket

the s3 bucket configuration is provided using specific configuration at the root of values.yaml

backup:
  store:
    timestamped: false
    s3:
      host: <s3-host>
      accessKey: <ak>
      secretKey: <sk>
      bucket: <bucket name>
      cert: <s3-host certificate if relevant>

if the pushgateway unit is enabled, backup results are sent as metrics to prometheus using the pushgateway.

Related reference(s)

fix #1784 (closed)

Test coverage

Manual for now on management as workload cluster, providing following logs:

-- Set kubectl configuration.
Cluster "internal" set.
User "user" set.
Context "internal" created.
Switched to context "internal".
List of namespaces to backup : my-rke2-capo-workload sylva-system

-- Start backing up clusters from namespace 'my-rke2-capo-workload'.
Moving to directory...
Discovering Cluster API objects
Starting move of Cluster API objects Clusters=1
Moving Cluster API objects ClusterClasses=0
Saving files to /tmp/tmp.LkkniL/my-rke2-capo-workload_capi_resources_backup
-- Clusters backed up.
-- Backup compressed.
Added `backup` successfully.
`/tmp/tmp.BHCgAN` -> `backup/sylva-backup/my-rke2-capo-workload_capi_resources_backup.tar.gz`
Total: 62.97 KiB, Transferred: 62.97 KiB, Speed: 3.30 MiB/s
-- Backup uploaded
Backup succeeded in 123 seconds
-- Push result to the pushgateway

-- Start backing up clusters from namespace 'sylva-system'.
Moving to directory...
Discovering Cluster API objects
Starting move of Cluster API objects Clusters=1
Moving Cluster API objects ClusterClasses=0
Saving files to /tmp/tmp.OMaIbg/sylva-system_capi_resources_backup
-- Clusters backed up.
-- Backup compressed.
Added `backup` successfully.
`/tmp/tmp.dcMpgL` -> `backup/sylva-backup/sylva-system_capi_resources_backup.tar.gz`
Total: 72.93 KiB, Transferred: 72.93 KiB, Speed: 3.97 MiB/s
-- Backup uploaded
Backup succeeded in 91 seconds
-- Push result to the pushgateway

Backup summary:
    2 Succeeded: my-rke2-capo-workload sylva-system
    0 Failed   :

If prometheus is deployed, the folowing metrics are available :

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon	Meaning	Available values
☁️	Infra Provider	`capd`, `capo`, `capm3`
🚀	Bootstrap Provider	`kubeadm` (alias `kadm`), `rke2`
🐧	Node OS	`ubuntu`, `suse`
🛠️	Deployment Options	`light-deploy`, `oci`, `ha`, `misc`
🎬	Pipeline Scenarios	`no-wkld` `simple-update` `simple-update-no-wkld` `rolling-update` `rolling-update-no-wkld` `wkld-k8s-upgrade` `nightly` `sylva-upgrade` `sylva-upgrade-no-wkld` `sylva-upgrade-from-x.x.x` `preview`

Global config for deployment pipelines

autorun pipelines
allow failure on pipelines

Notes:

Enabling autorun will make deployment pipelines to be run automatically without human interaction
Disabling allow failure will make deployment pipelines mandatory for pipeline success.
if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited May 19, 2025 by Arnaud Bouts

Add unit to backup CAPI resources of all clusters

What does this MR do and why?

Related reference(s)

Test coverage

CI configuration

Global config for deployment pipelines

Merge request reports