effectively set the dependency on 'cluster-machines-ready' unit, for relevant units
One issue slipped into !2032 (merged): I used the inexisting Release.Upgrade value instead of Release.IsUpgrade making the "have all units wait on it during upgrades" part of the MR ineffective. Credit to @feleouet for noticing the issue.
This MR corrects that.
However, when implementing this MR, the initial idea "have all units wait on cluster-machines-ready during upgrades" had to be refined into "have the units that the 'cluster' unit depends on reconcile before 'cluster', and have the rest of the units reconcile after all CAPI things are ready".
This is explained in this comment. Having "all units depend on cluster-machines-ready is just impossible: for instance a unit like cert-manager, on which the cluster unit indirectly depends on (because capX units depend on cert-manager) can't depend on cluster-machines-ready without creating a circular dependency.
-
The first attempt at an implementation of this MR consisted in introducing a named template that let us determine if a unit is among the units that the
clusterunit depends on, directly or indirectly. This was a recursive implementation, and getting it to work wasn't entirely trivial. This was implemented with the great help of @feleouet who patiently listened, challenged implementation approaches and provided a simple but powerful simplification (implemented separately in !2055 (merged)). The major drawback of this approach was that it was heavily recursive and very costly (for all unit inheriting from base-deps we were calling ahas-dependency-onnamed template, which was recursive and hence calling itself, and was also callingunit-def-from-templateswhich was also interpreting the unit and also callinghas-dependency-on). With this implemention, the runtime ofhelm templatewas multiplied by 3 (9.8s vs 3.3s on my laptop), and worse the Flux Helm controller was hitting CPU throttling and was in practice not able to compute the content of the manifests. -
The second implementation (current content of this MR) is doing things differently:
- the list of units that the
clusterunit has a dependency on, directly or indirectly, is computed once in_internal.cluster_unit_deps - in the
base-depsunit template, whether or not the current unit should depend on thecluster-machines-readyunit is determined using this pre-computed_internal.cluster_unit_depslist - for this approach to be possible,
_internal.cluster_unit_depscannot be computed in_internal: computing 'cluster_unit_deps' in _internal would result in fully interpreting 'unit_template.base-deps', resulting in interpreting 'depends_on.cluster-machines-ready' which itself relies on 'cluster_unit_deps', causing a variable interpretation loop - (with this approach
helm templateruns essentially as fast as before)
- the list of units that the
Contributes to #1157 (closed)