rancher-webhook blocked by Kyverno in HA deployments

While deploying a HA cluster I noticed that the rancher-webhook is taking a lot of time to be deployed. After some investigation I saw some helm-operation-XXXX pods were in status error

# kubectl --kubeconfig management-cluster-kubeconfig -n cattle-system get all
NAME                           READY   STATUS      RESTARTS   AGE
pod/helm-operation-95pjf       1/2     Error       0          20m
pod/helm-operation-ccmcd       1/2     Error       0          12m
pod/helm-operation-dlvfp       1/2     Error       0          7m50s

I found the same errors in some of our CI jobs, for example https://gitlab.com/sylva-projects/sylva-core/-/jobs/6801581881

The pod log shows that kyverno is blocking the deployment:

Error: UPGRADE FAILED: failed to create resource: admission webhook "validate.kyverno.svc-fail" denied the request: 

resource Deployment/cattle-system/rancher-webhook was blocked due to the following policies 

pdb-minavailable-check:
  pdb-minavailable: The matching PodDisruptionBudget for this resource has its minAvailable
    value greater or equal to the replica count which is not permitted.

This seems to be caused by our rancher-webhook-pdb which sets minAvailable to 1 and a race between two clusterPolicies:

  • pdb-minavailable-check causes the above error because the initial replicas is 1
  • rancher-webhook-replicas which has to increase the replicas to 2

cc: @feleouet

Edited May 13, 2024 by Thomas Morin
Assignee Loading
Time tracking Loading