integrate cluster-maxunavailable refactoring with associated CI testing
This MR is here to integrate sylva-projects/sylva-elements/misc-controllers-suite!99 (merged)
The context is a refactoring to:
- address #3234
- improve CI testing to have an automated way of checking that this controller does its job properly
- this is done by introducing Prometheus metrics for CAPI Machines, defining a Prometheus alert corresponding to a situation where the cluster-maxunavailable controller would not have done its job, and refactoring the CI job that checks Prometheus alarms to have it look at alarms that fired during the run (instead of only the alarms that were running when the job ran)
- (this builds on !6575 (merged))
This MR depends on:
- !6575 (merged)
- sylva-projects/sylva-elements/misc-controllers-suite!99 (merged)
-
sylva-projects/sylva-elements/ci-tooling/ci-deployment-values!339 (merged)
- to be able to forcefully turn cluster-maxunavailable on/off
- this is used within "
🛠️ cluster-maxunavailable-force-disable🟢 cluster-maxunavailable-alert" to confirm that when the controller is disabled, the alert fires
Conclusive testing
- in pipelines with "
🛠️ cluster-maxunavailable-force-disable🟢 cluster-maxunavailable-alert" the cluster-maxunavailable functionality is forcefully turned off and the violation of the "never more than 1 unavailable machine" guarantee is appropriately catched by the mgmt-thanos-alert CI job - in pipelines where the cluster-maxunavailable functionality is enabled (CAPO jobs with "
🛠️ cluster-maxunavailable-force-enable", or capm3 jobs which have it turned on by default) the mgmt-thanos-alert CI job does not fail
References
This MR addresses #3234
CI configuration
Below you can choose test deployment variants to run in this MR's CI.
Click to open to CI configuration
Legend:
| Icon | Meaning | Available values |
|---|---|---|
| Infra Provider |
capd, capo, capm3
|
|
| Bootstrap Provider |
kubeadm (alias kadm), rke2, okd, ck8s
|
|
| Node OS |
ubuntu, suse, na, leapmicro
|
|
| Deployment Options |
light-deploy, dev-sources, ha, misc, maxsurge-0, logging, no-logging, cilium
|
|
| Pipeline Scenarios | Available scenario list and description | |
| Enabled units | Any available units name, by default apply to management and workload cluster. Can be prefixed by mgmt: or wkld: to be applied only to a specific cluster type |
|
| Target platform | Can be used to select specific deployment environment (i.e real-bmh for capm3 ) |
-
🎬 preview☁️ capd🚀 kadm🐧 ubuntu -
🎬 preview☁️ capo🚀 rke2🐧 suse -
🎬 preview☁️ capm3🚀 rke2🐧 ubuntu -
☁️ capd🚀 kadm🛠️ light-deploy🐧 ubuntu -
☁️ capd🚀 rke2🛠️ light-deploy🐧 suse -
☁️ capo🚀 rke2🐧 suse -
☁️ capo🚀 rke2🐧 leapmicro -
☁️ capo🚀 kadm🐧 ubuntu -
☁️ capo🚀 kadm🐧 ubuntu🟢 neuvector,mgmt:harbor -
☁️ capo🚀 rke2🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capo🚀 kadm🎬 wkld-k8s-upgrade🐧 ubuntu -
☁️ capo🚀 rke2🎬 rolling-update-no-wkld🛠️ ha,maxsurge-0,cluster-maxunavailable-force-disable🟢 cluster-maxunavailable-alert🐧 suse -
☁️ capo🚀 rke2🎬 rolling-update-no-wkld🛠️ ha,maxsurge-0,cluster-maxunavailable-force-enable🐧 suse -
☁️ capo🚀 kadm🎬 rolling-update-no-wkld🛠️ ha,maxsurge-0,cluster-maxunavailable-force-enable🐧 ubuntu -
☁️ capo🚀 rke2🎬 sylva-upgrade🛠️ ha🐧 ubuntu -
☁️ capo🚀 rke2🎬 sylva-upgrade-from-1.6.x🛠️ ha,misc🐧 ubuntu -
☁️ capo🚀 rke2🛠️ ha,misc🐧 ubuntu -
☁️ capo🚀 rke2🛠️ ha,misc,openbao🐧 suse -
☁️ capo🚀 rke2🐧 suse🎬 upgrade-from-prev-tag -
☁️ capm3🚀 rke2🐧 suse -
☁️ capm3🚀 kadm🐧 ubuntu -
☁️ capm3🚀 ck8s🐧 ubuntu -
☁️ capm3🚀 kadm🎬 rolling-update-no-wkld🛠️ ha,misc🐧 ubuntu -
☁️ capm3🚀 rke2🎬 wkld-k8s-upgrade🛠️ ha🐧 suse -
☁️ capm3🚀 kadm🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capm3🚀 rke2🎬 rolling-update-no-wkld🛠️ ha🐧 ubuntu -
☁️ capm3🚀 rke2🎬 rolling-update-no-wkld🛠️ ha,cluster-maxunavailable-force-disable🟢 cluster-maxunavailable-alert🐧 ubuntu -
☁️ capm3🚀 rke2🎬 upgrade-from-prev-release-branch🛠️ ha🐧 suse -
☁️ capm3🚀 rke2🛠️ misc,ha🐧 suse -
☁️ capm3🚀 rke2🎬 sylva-upgrade🛠️ ha🐧 suse -
☁️ capm3🚀 kadm🎬 sylva-upgrade🛠️ ha🐧 suse -
☁️ capm3🚀 kadm🎬 rolling-update🛠️ ha🐧 suse -
☁️ capm3🚀 ck8s🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capm3🚀 rke2|okd🎬 no-update🐧 ubuntu|na -
☁️ capm3🚀 rke2🐧 suse🎬 upgrade-from-release-1.5 -
☁️ capm3🚀 rke2🐧 suse🎬 upgrade-to-main
Global config for deployment pipelines
- autorun pipelines
- allow failure on pipelines
- record sylvactl events
Notes:
- Enabling
autorunwill make deployment pipelines to be run automatically without human interaction - Disabling
allow failurewill make deployment pipelines mandatory for pipeline success. - if both
autorunandallow failureare disabled, deployment pipelines will need manual triggering but will be blocking the pipeline
Be aware: after configuration change, pipeline is not triggered automatically.
Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.