add cluster-machines-ready unit / have all units wait on it during upgrades
What this MR does:
- introduce a
cluster-machines-readyunit that checks that all CAPI Machines are fully ready (and have joined the cluster) - have all units depend on this unit when we're doing an upgrade
Motivations:
- this is one of the two tentative partial mitigations for #1157 (closed): by waiting that control plane machines are ready before doing all the API actions triggered by unit upgrades, we reduce the likeliness of hitting #1157 (closed) -- to cover this waiting for control plane machines would be sufficient
- generally speaking, I'm inclined to believe that we increase the probability of encountering less tested corner cases, races conditions, or side-effects of readiness probes if in parallel we do the CAPI node rolling update and all the other cluster upgrades -- this means a lot of pod scheduling/eviction/deletion/creation happening at the same time -- for this reason this MR is not only covering control plane nodes, but all nodes
Edited by Thomas Morin