Monitor Kubernetes ephemeral storage
Pods that require some sort of storage, but are not configured to use external volumes can exhaust the ephemeral storage resource on nodes, which leads to evictions, reschduling and failure to clean up old pods
thanos-compactor-86bffd98f5-2j5gv 0/1 ContainerStatusUnknown 1 167m
thanos-compactor-86bffd98f5-2qhnk 0/1 ContainerStatusUnknown 1 22h
thanos-compactor-86bffd98f5-4xjpk 0/1 ContainerStatusUnknown 1 21h
thanos-compactor-86bffd98f5-57vz8 0/1 ContainerStatusUnknown 1 93m
thanos-compactor-86bffd98f5-7d5d9 0/1 ContainerStatusUnknown 1 29m
...
thanos-compactor-86bffd98f5-845zg 0/1 ContainerStatusUnknown 1 58m
thanos-compactor-86bffd98f5-b6b8r 0/1 ContainerStatusUnknown 1 65m
thanos-compactor-86bffd98f5-chzhb 1/1 Running 0 2m
...
thanos-compactor-86bffd98f5-jt4tj 0/1 Error 0 21h
thanos-compactor-86bffd98f5-vnlrw 0/1 Error 0 25m
thanos-compactor-86bffd98f5-w9h8w 0/1 ContainerStatusUnknown 1 80m
thanos-compactor-86bffd98f5-zcc6v 0/1 ContainerStatusUnknown 1 152m
Warning Evicted <invalid> kubelet The node was low on resource: ephemeral-storage. Threshold quantity: 1054243241, available: 341120K
i. Container compactor was using 56Ki, request is 0, has larger consumption of ephemeral-storage.
Normal Killing <invalid> kubelet Stopping container compactor
Warning ExceededGracePeriod <invalid> kubelet Container runtime did not kill the pod within specified grace period.
kube-state-metrics doesn't expose ephemeral volume information and a discussion about how to monitor them has been ongoing since 2021 - looks stale at this point.
https://github.com/jmcgrath207/k8s-ephemeral-storage-metrics that deals with this gap in monitoring
It might also be possible to extend kube-state-metrics and have visibility into the ephemeral volumes usage - **preferable **solution as it wouldn't add an extra component to the monitoring stack