Evicted pods in EKS, DiskPressure on nodes

We periodically see evicted pods in EKS which require manual cleanup using something like:

kubectl get pods | awk '/Evicted/{ print $1}' | xargs kubectl delete pod

To improve overall environmental hygiene and minimize resource consumption, we should categorize these evictions to better understand failures and automate cleanup.

Recently, this has been cost often by DiskPressure on the nodes in the cluster. We should confirm the size of the backing disks, the cause of the pressure, and probably update the cluster version as well.

Suspects for disk fill:

  • logs
  • our images, and the many revisions of them.

The pod termination message is:

Pod The node was low on resource: [DiskPressure].

/cc @WarheadsSE

Edited by Jason Plum