make the removal of root-dependency resources more robust

This MR aims at making more robust the removal of root-dependency Kustomizations/HelmReleases.

Context: this MR was initially worked on in the context of #2837 (closed), during the analysis of which we initially thought that unwanted deletion of the root-dependency resources would have happened; ultimately we understood that the cause was different, but still, we concluded at the time that the code removing the root-dependency resources could be made more robust

We currently have two places where we remove resources related to the root-dependency

  • sylva-units pre-upgrade hook sylva-units-delete-root-dependencies Job (the most likely cause of the events described in #2837 (closed))
    • it needs to delete only old root-dependency Kustomizations/HelmReleases, but today it deletes all of them
    • most of the time this ok, because all resources are old, because the pre-upgrade hook runs before anything new is created: no ks/hr resources for the "future" version exists yet, and the ones for the current version does not exist yet because we're in a pre-upgrade hook
    • but if such a Job was to remain running, including after the next version of the sylva-units HelmRelease is made, then it would delete "future" releases
    • (note that we don't actually know if the way Helm and FluxCD HelmRelease controller work would allow such a scenario to happen)
  • the root-dependency-check Job
    • this one is already carefully avoiding to delete the resources for the current release
    • maybe there would be scenario where, if a Pod job takes a long time to run, it might end up running in parallel with it's counterpart for a most recent one, and might delete the ks/hr for a most recent ones (we're unsure if this really can happen)

💡 hint for reviewers: this MR will be easier to review one commit at a time 💡

What this MR does is:

  • (3rd commit) ensure that when we delete ks/hr root-dependency resources, we always only pick ones for strictly older versions ; to achieve this
    • the root-dependency-check Job script does now a < current comparison instead of a != current test
    • the sylva-units-delete-root-dependencies Job now reuses the same exact code
  • (2nd commit) some refactoring was done to allow sharing the code
  • (1st commit) a cleanup of old-now-useless code
  • (4th commit) revert of !5428 (merged), the first tentative to fix #2837 (closed)

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon Meaning Available values
☁️ Infra Provider capd, capo, capm3
🚀 Bootstrap Provider kubeadm (alias kadm), rke2, okd, ck8s
🐧 Node OS ubuntu, suse, na, leapmicro
🛠️ Deployment Options light-deploy, dev-sources, ha, misc, maxsurge-0, logging, no-logging
🎬 Pipeline Scenarios Available scenario list and description
  • 🎬 preview ☁️ capd 🚀 kadm 🐧 ubuntu

  • 🎬 preview ☁️ capo 🚀 rke2 🐧 suse

  • 🎬 preview ☁️ capm3 🚀 rke2 🐧 ubuntu

  • ☁️ capd 🚀 kadm 🛠️ light-deploy 🐧 ubuntu

  • ☁️ capd 🚀 rke2 🛠️ light-deploy 🐧 suse

  • ☁️ capo 🚀 rke2 🐧 suse

  • ☁️ capo 🚀 rke2 🐧 leapmicro

  • ☁️ capo 🚀 kadm 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 kadm 🎬 wkld-k8s-upgrade 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 rolling-update-no-wkld 🛠️ ha 🐧 suse

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade-from-1.4.x 🛠️ ha 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🎬 sylva-upgrade-from-1.4.x 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capo 🚀 rke2 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🐧 suse

  • ☁️ capm3 🚀 kadm 🐧 ubuntu

  • ☁️ capm3 🚀 ck8s 🐧 ubuntu

  • ☁️ capm3 🚀 kadm 🎬 rolling-update-no-wkld 🛠️ ha,misc 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 wkld-k8s-upgrade 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2 🎬 sylva-upgrade-from-1.4.x 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 rke2 🛠️ misc,ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 sylva-upgrade-from-1.4.x 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 kadm 🎬 rolling-update 🛠️ ha 🐧 suse

  • ☁️ capm3 🚀 ck8s 🎬 rolling-update 🛠️ ha 🐧 ubuntu

  • ☁️ capm3 🚀 rke2|okd 🎬 no-update 🐧 ubuntu|na

Global config for deployment pipelines

  • autorun pipelines
  • allow failure on pipelines
  • record sylvactl events

Notes:

  • Enabling autorun will make deployment pipelines to be run automatically without human interaction
  • Disabling allow failure will make deployment pipelines mandatory for pipeline success.
  • if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited by Thomas Morin

Merge request reports

Loading