rancher-webhook blocked by Kyverno in HA deployments
While deploying a HA cluster I noticed that the rancher-webhook is taking a lot of time to be deployed. After some investigation I saw some helm-operation-XXXX pods were in status error
# kubectl --kubeconfig management-cluster-kubeconfig -n cattle-system get all
NAME READY STATUS RESTARTS AGE
pod/helm-operation-95pjf 1/2 Error 0 20m
pod/helm-operation-ccmcd 1/2 Error 0 12m
pod/helm-operation-dlvfp 1/2 Error 0 7m50s
I found the same errors in some of our CI jobs, for example https://gitlab.com/sylva-projects/sylva-core/-/jobs/6801581881
The pod log shows that kyverno is blocking the deployment:
Error: UPGRADE FAILED: failed to create resource: admission webhook "validate.kyverno.svc-fail" denied the request:
resource Deployment/cattle-system/rancher-webhook was blocked due to the following policies
pdb-minavailable-check:
pdb-minavailable: The matching PodDisruptionBudget for this resource has its minAvailable
value greater or equal to the replica count which is not permitted.
This seems to be caused by our rancher-webhook-pdb which sets minAvailable to 1 and a race between two clusterPolicies:
-
pdb-minavailable-checkcauses the above error because the initial replicas is 1 -
rancher-webhook-replicaswhich has to increase the replicas to 2
cc: @feleouet
Edited by Thomas Morin