Skip to content

rework cluster-machines-ready to fix kubeadm/calico issue for workload clusters

Thomas Morin requested to merge kubeadm-fix-deps into main

This resolves #1213 (closed) (a different solution than what was attempted in !2162 (closed)).

TL;DR

We have issue #1213 (closed) because some units that should sometimes not depend on cluster-machines-ready are depending on it, and the causes for that is that (a) we haven't expressed this conditional dependency in the right way and (b) we haven't expressed in our code a functional explicit dependency (with kubeadm, Machine don't become ready until Calico is installed).

Details

The conditional dependency on "cluster-machines-ready" was introduced as "any unit that the cluster unit does not depend on, should depend on cluster-machines-ready". The "that the cluster unit does not depend on" was meant to avoid a dependency loop such as: cluster depends on unit X, X depends on Y, Y depends on cluster-machines-ready... which depends on cluster.

In fact to avoid such dependency loops, a better condition would be "any unit that the cluster-machines-ready unit does not depend on, should depend on cluster-machines-ready": simpler, straight to the point.

Both conditions are mostly equivalent because most units are either units that the cluster unit depends on, or units that do not and would depend on cluster-machines-ready ... but there are exceptions to this "most", and those exceptions are the reason why !2044 (merged) was done, and the reason why issues still remain (#1213 (closed)).

Those exceptions are the Calico-related units: calico, calico-crd and namespace-defs (on which calico units depend): the cluster unit does not depend on those units (so baring other considerations, they would depend on cluster-machines-ready), but with Kubeadm, on first installation, the cluster-machines-ready unit will not come up before Calico is installed (there is an error condition on CAPI Machines if Calico isn't installed). Said otherwise, in this specific combination/phase, we in fact have a dependency of cluster-machines-ready on calico. This dependency was never expressed explicitly in our code.

If we make this dependency explicit, then we can use "any unit that the cluster-machines-ready unit does not depend on, should depend on cluster-machines-ready" as the criteria for a unit to depend or not on cluster-machines-ready, and this solves #1213 (closed) because on the first installation, the namespace-defs unit will not depend on cluster-machines-ready/cluster-ready/cluster... because namespace-defs will appear in the list of units that cluster-machines-ready depends on, because for kubeadm/first-installation, calico will be a dependency of cluster-machines-ready and hence namespace-defs as well.

What this MR does

  • make explicit the dependency of cluster-machines-ready on calico, with Kubeadm, and on first installation
  • rework the base-deps dependency on cluster-machines-ready as "any unit that the cluster-machines-ready does not depend on, should depend on cluster-machines-ready"
  • simplify the code, remove most of what !2044 (merged) had introduced for workload clusters

/cc @rletrocquer @feleouet

Edited by Thomas Morin

Merge request reports