Refactor vsphere-cpi dependencies

What does this MR do and why?

Refactor the way the vsphere-cpi unit is deployed on CAPV clusters.

The CPI is responsible of the rollout of control plane nodes so must run as soon as the first control plane node has been created.

Basically the CAPV workflow is the following:

  1. The arg cloud-provider: external is passed to the kubelet (see here for kubeadm, here for RKE2 ) that taints the nodes with node.cloudprovider.kubernetes.io/uninitialized;
  2. The first control plane node is deployed with the taint above, the control plane cannot scale up (in case of HA);
  3. The vsphere-cpi is deployed on the cluster and removes the taint from the control plane node, allowing the CP rollout to go on.

The proposed solution is to:

  1. Remove from the cluster unit the healthChecks on Cluster and KubeadmControlPlane/RKE2ControlPlane because they would never become Ready before the vsphere-cpi has been deployed;
  2. Make the vsphere-cpi depend on cluster

This way we shouldn't need anymore to set retries: -1 on the vsphere-cpi HelmRelase as the dependency is handled correctly.

Future improvements

It would be great if the healthChecks that have been removed from cluster could be included in cluster-ready to ensure we're deploying units only when the cluster is actually Ready although cluster-machines-ready does something very similar.

Related reference(s)

Closes #1508 (closed)

This is the continuation of the work in !2272 (closed) (unfortunately Teodoro has left the company), I'd suggest to close it in favor of this

Edited by Federico Cicchiello

Merge request reports

Loading