a PDB with default policy potentially prevents node draining

Having maxUnavailable or minAvailable properly set on a PDB is not a sufficient condition to allow draining one pod of a given set of pods.

With the default PDB settings, you also need to have this pod be Ready -- https://kubernetes.io/docs/tasks/run-application/configure-pdb/#healthiness-of-a-pod

Since 1.27, there is a new spec.unhealthyPodEvictionPolicy field that we can set to AlwaysAllow to let the eviction API consider unhealthy pods as ok for being evicted.

We need to ensure that a Sylva cluster is not subject to node draining being blocked by this case.

List of things to do, or discuss:

  • for units that we package in Sylva and that define a PDB, set spec.unhealthyPodEvictionPolicy: AlwaysAllow
  • to ensure that we cover them all and don't regress, introduce a Kyverno Audit policy to detect PDB which would not have this setting
  • we'll possibly have to make exceptions (for instance, I'm not sure that it would be safe to let the eviction API allow the eviction of a longhorn PV instance that would not be ready, maybe we'll need an exception) -- the Kyverno policy should allow setting a label on a PDB to have it be ignore by the policy
  • we should ensure that problematic PDBs are exposed to the monitoring layer so that cluster operators are aware of potential draining issues

(related issue: #1560 (comment 2085803168))

Assignee Loading
Time tracking Loading