add cluster-machines-ready unit / have all units wait on it during upgrades

What this MR does:

  • introduce a cluster-machines-ready unit that checks that all CAPI Machines are fully ready (and have joined the cluster)
  • have all units depend on this unit when we're doing an upgrade

Motivations:

  • this is one of the two tentative partial mitigations for #1157 (closed): by waiting that control plane machines are ready before doing all the API actions triggered by unit upgrades, we reduce the likeliness of hitting #1157 (closed) -- to cover this waiting for control plane machines would be sufficient
  • generally speaking, I'm inclined to believe that we increase the probability of encountering less tested corner cases, races conditions, or side-effects of readiness probes if in parallel we do the CAPI node rolling update and all the other cluster upgrades -- this means a lot of pod scheduling/eviction/deletion/creation happening at the same time -- for this reason this MR is not only covering control plane nodes, but all nodes
Edited by Thomas Morin

Merge request reports

Loading