Refactor vsphere-cpi dependencies
What does this MR do and why?
Refactor the way the vsphere-cpi unit is deployed on CAPV clusters.
The CPI is responsible of the rollout of control plane nodes so must run as soon as the first control plane node has been created.
Basically the CAPV workflow is the following:
- The arg
cloud-provider: externalis passed to the kubelet (see here for kubeadm, here for RKE2 ) that taints the nodes withnode.cloudprovider.kubernetes.io/uninitialized; - The first control plane node is deployed with the taint above, the control plane cannot scale up (in case of HA);
- The
vsphere-cpiis deployed on the cluster and removes the taint from the control plane node, allowing the CP rollout to go on.
The proposed solution is to:
- Remove from the
clusterunit thehealthChecksonClusterandKubeadmControlPlane/RKE2ControlPlanebecause they would never becomeReadybefore thevsphere-cpihas been deployed; - Make the
vsphere-cpidepend oncluster
This way we shouldn't need anymore to set retries: -1 on the vsphere-cpi HelmRelase as the dependency is handled correctly.
Future improvements
It would be great if the healthChecks that have been removed from cluster could be included in cluster-ready to ensure we're deploying units only when the cluster is actually Ready although cluster-machines-ready does something very similar.
Related reference(s)
Closes #1508 (closed)
This is the continuation of the work in !2272 (closed) (unfortunately Teodoro has left the company), I'd suggest to close it in favor of this