Tigera-operator doesn't tolerate node.cloudprovider.kubernetes.io/uninitialized

Summary

Rke2 clusters on CAPV are broken probably because of sylva-projects/sylva-elements/helm-charts/sylva-capi-cluster!391 (merged)

The tigera-operator pod can't be scheduled because it doesn't tolerate the taint node.cloudprovider.kubernetes.io/uninitialized: true that is removed once the vsphere-cpi has been deployed on the cluster.

This should be done by flux but it cannot reach the apiServer because metallb is stuck until the taint node.kubernetes.io/not-ready: is removed by the cni, so we have a deadlock situation.

related references

The hypothesis is that sylva-projects/sylva-elements/helm-charts/sylva-capi-cluster!391 (merged) in some way is overriding the tigera-operator toleration on node.cloudprovider.kubernetes.io/uninitialized: true that would unlock the situation

The bug doesn't happen on kubeadm clusters (since !2540 (merged) does the same for kubeadm) because kube-vip in that case is deployed as a static pod running on the hostNetwork, so it doesn't depend on the cni.

Assignee Loading
Time tracking Loading