integrate cluster-maxunavailable refactoring with associated CI testing

This MR is here to integrate sylva-projects/sylva-elements/misc-controllers-suite!99 (merged)

The context is a refactoring to:

  • address #3234
  • improve CI testing to have an automated way of checking that this controller does its job properly
    • this is done by introducing Prometheus metrics for CAPI Machines, defining a Prometheus alert corresponding to a situation where the cluster-maxunavailable controller would not have done its job, and refactoring the CI job that checks Prometheus alarms to have it look at alarms that fired during the run (instead of only the alarms that were running when the job ran)
    • (this builds on !6575 (merged))

This MR depends on:

Conclusive testing

  • in pipelines with "🛠️cluster-maxunavailable-force-disable 🟢cluster-maxunavailable-alert" the cluster-maxunavailable functionality is forcefully turned off and the violation of the "never more than 1 unavailable machine" guarantee is appropriately catched by the mgmt-thanos-alert CI job
  • in pipelines where the cluster-maxunavailable functionality is enabled (CAPO jobs with "🛠️cluster-maxunavailable-force-enable", or capm3 jobs which have it turned on by default) the mgmt-thanos-alert CI job does not fail

References

This MR addresses #3234

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon Meaning Available values
☁️ Infra Provider capd, capo, capm3
🚀 Bootstrap Provider kubeadm (alias kadm), rke2, okd, ck8s
🐧 Node OS ubuntu, suse, na, leapmicro
🛠️ Deployment Options light-deploy, dev-sources, ha, misc, maxsurge-0, logging, no-logging, cilium
🎬 Pipeline Scenarios Available scenario list and description
🟢 Enabled units Any available units name, by default apply to management and workload cluster. Can be prefixed by mgmt: or wkld: to be applied only to a specific cluster type
🏗️ Target platform Can be used to select specific deployment environment (i.e real-bmh for capm3 )
  • 🎬 preview ☁️ capd 🚀 kadm 🐧 ubuntu

  • 🎬 preview ☁️ capo 🚀 rke2 🐧 suse

  • 🎬 preview ☁️ capm3 🚀 rke2 🐧 ubuntu

  • ☁️ capd 🚀 kadm 🛠️ light-deploy 🐧 ubuntu

  • ☁️ capd 🚀 rke2 🛠️ light-deploy 🐧 suse

  • ☁️ capo 🚀 rke2 🐧 suse

  • ☁️ capo 🚀 rke2 🐧 leapmicro

  • ☁️ capo 🚀 kadm 🐧 ubuntu

  • ☁️ capo 🚀 kadm 🐧 ubuntu 🟢 neuvector,mgmt:harbor

  • ☁️ capo 🚀 rke2 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 kadm 🎬 wkld-k8s-upgrade 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha,maxsurge-0,cluster-maxunavailable-force-disable 🟢 cluster-maxunavailable-alert 🐧 suse

  • ☁️ capo 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha,maxsurge-0,cluster-maxunavailable-force-enable 🐧 suse

  • ☁️ capo 🚀 kadm 🎬 rolling-update-no-wkld 🛠️ ha,maxsurge-0,cluster-maxunavailable-force-enable 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade-from-1.6.x 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🛠️ ha,misc,openbao🐧 suse

  • ☁️ capo 🚀 rke2 🐧 suse 🎬 upgrade-from-prev-tag

  • ☁️ capm3 🚀 rke2 🐧 suse

  • ☁️ capm3 🚀 kadm 🐧 ubuntu

  • ☁️ capm3 🚀 ck8s 🐧 ubuntu

  • ☁️ capm3 🚀 kadm 🎬 rolling-update-no-wkld 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 wkld-k8s-upgrade 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha,cluster-maxunavailable-force-disable 🟢 cluster-maxunavailable-alert 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 upgrade-from-prev-release-branch 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 rke2 🛠️ misc,ha 🐧 suse

  • ☁️ capm3 🚀 rke2 🎬 sylva-upgrade 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 sylva-upgrade 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 ck8s 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2|okd 🎬 no-update 🐧 ubuntu|na

  • ☁️ capm3 🚀 rke2 🐧 suse 🎬 upgrade-from-release-1.5

  • ☁️ capm3 🚀 rke2 🐧 suse 🎬 upgrade-to-main

Global config for deployment pipelines

  • autorun pipelines
  • allow failure on pipelines
  • record sylvactl events

Notes:

  • Enabling autorun will make deployment pipelines to be run automatically without human interaction
  • Disabling allow failure will make deployment pipelines mandatory for pipeline success.
  • if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited by Thomas Morin

Merge request reports

Loading