Add unit to backup ETCD of the current management or workload cluster

What does this MR do and why?

Related reference(s)

fix: #1786 (closed)

This merge request add a unit backup-etcd which backup etcd database of the cluster.

It's the second MR of a batch of two MRs:

It is using etcdctl which connect to etcd API using etcd certificates. The location of these certificates depends on the distribution (/etc/kubernetes/pki/etcd/ for kubeadm and /var/lib/rancher/rke2/server/tls/etcd/ for rke2.

The uid/gid of etcd is also required to read these certificates, that's why the uid/gid of etcd are set to 915 (value can be changed ^^) if the backup is enabled bu creating the etcd user using pre-commands on the controlplane (working with kubeadm and rke2).

It is using kube-cronjob to periodically perform this backup. Backups are not versioned for now but a timestamp can be added to the name of files to avoid overwriting previous backup if the target s3 bucket do not support versioning.

The backup operation consists in:

  • build a tar.gz from etcdctl snapshot save to save the etcd databse.
  • save this tar.gz to a s3 bucket
    • the s3 bucket configuration is provided using specific configuration at the root of values.yaml
      backup:
        store:
          timestamped: false
          s3:
            host: <s3-host>
            accessKey: <ak>
            secretKey: <sk>
            bucket: <bucket name>
            cert: <s3-host certificate if relevant>
  • if the pushgateway unit is enabled, backup results are sent as metrics to prometheus using the pushgateway.

Test coverage

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon Meaning Available values
☁️ Infra Provider capd, capo, capm3
🚀 Bootstrap Provider kubeadm (alias kadm), rke2
🐧 Node OS ubuntu, suse
🛠️ Deployment Options light-deploy, dev-sources, ha, misc, maxsurge-0, logging
🎬 Pipeline Scenarios Available scenario list and description
  • 🎬 preview ☁️ capd 🚀 kadm 🐧 ubuntu

  • 🎬 preview ☁️ capo 🚀 rke2 🐧 suse

  • 🎬 preview ☁️ capm3 🚀 rke2 🐧 ubuntu

  • ☁️ capd 🚀 kadm 🛠️ light-deploy 🐧 ubuntu

  • ☁️ capd 🚀 rke2 🛠️ light-deploy 🐧 suse

  • ☁️ capo 🚀 rke2 🐧 suse

  • ☁️ capo 🚀 kadm 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 kadm 🎬 wkld-k8s-upgrade 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha,misc 🐧 suse

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade-from-1.3.x 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🐧 suse

  • ☁️ capm3 🚀 kadm 🐧 ubuntu

  • ☁️ capm3 🚀 kadm 🎬 rolling-update-no-wkld 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 wkld-k8s-upgrade 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 sylva-upgrade-from-1.3.x 🛠️ misc,ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 suse

Global config for deployment pipelines

  • autorun pipelines
  • allow failure on pipelines
  • record sylvactl events

Notes:

  • Enabling autorun will make deployment pipelines to be run automatically without human interaction
  • Disabling allow failure will make deployment pipelines mandatory for pipeline success.
  • if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited by Arnaud Bouts

Merge request reports

Loading