rework cluster-machines-ready to fix kubeadm/calico issue for workload clusters
This resolves #1213 (closed) (a different solution than what was attempted in !2162 (closed)).
TL;DR
We have issue #1213 (closed) because some units that should sometimes not depend on cluster-machines-ready
are depending on it, and the causes for that is that (a) we haven't expressed this conditional dependency in the right way and (b) we haven't expressed in our code a functional explicit dependency (with kubeadm, Machine don't become ready until Calico is installed).
Details
The conditional dependency on "cluster-machines-ready" was introduced as "any unit that the cluster
unit does not depend on, should depend on cluster-machines-ready
". The "that the cluster
unit does not depend on" was meant to avoid a dependency loop such as: cluster
depends on unit X, X depends on Y, Y depends on cluster-machines-ready
... which depends on cluster
.
In fact to avoid such dependency loops, a better condition would be "any unit that the cluster-machines-ready
unit does not depend on, should depend on cluster-machines-ready
": simpler, straight to the point.
Both conditions are mostly equivalent because most units are either units that the cluster
unit depends on, or units that do not and would depend on cluster-machines-ready
... but there are exceptions to this "most", and those exceptions are the reason why !2044 (merged) was done, and the reason why issues still remain (#1213 (closed)).
Those exceptions are the Calico-related units: calico
, calico-crd
and namespace-defs
(on which calico units depend): the cluster
unit does not depend on those units (so baring other considerations, they would depend on cluster-machines-ready
), but with Kubeadm, on first installation, the cluster-machines-ready
unit will not come up before Calico is installed (there is an error condition on CAPI Machines if Calico isn't installed). Said otherwise, in this specific combination/phase, we in fact have a dependency of cluster-machines-ready
on calico
. This dependency was never expressed explicitly in our code.
If we make this dependency explicit, then we can use "any unit that the cluster-machines-ready
unit does not depend on, should depend on cluster-machines-ready
" as the criteria for a unit to depend or not on cluster-machines-ready
, and this solves #1213 (closed) because on the first installation, the namespace-defs
unit will not depend on cluster-machines-ready
/cluster-ready
/cluster
... because namespace-defs
will appear in the list of units that cluster-machines-ready
depends on, because for kubeadm/first-installation, calico
will be a dependency of cluster-machines-ready
and hence namespace-defs
as well.
What this MR does
- make explicit the dependency of
cluster-machines-ready
oncalico
, with Kubeadm, and on first installation - rework the
base-deps
dependency oncluster-machines-ready
as "any unit that thecluster-machines-ready
does not depend on, should depend oncluster-machines-ready
" - simplify the code, remove most of what !2044 (merged) had introduced for workload clusters