Monitor Kubernetes ephemeral storage

Pods that require some sort of storage, but are not configured to use external volumes can exhaust the ephemeral storage resource on nodes, which leads to evictions, reschduling and failure to clean up old pods

thanos-compactor-86bffd98f5-2j5gv        0/1     ContainerStatusUnknown   1          167m                                                                        
thanos-compactor-86bffd98f5-2qhnk        0/1     ContainerStatusUnknown   1          22h                                                                         
thanos-compactor-86bffd98f5-4xjpk        0/1     ContainerStatusUnknown   1          21h                                                                         
thanos-compactor-86bffd98f5-57vz8        0/1     ContainerStatusUnknown   1          93m                                                                         
thanos-compactor-86bffd98f5-7d5d9        0/1     ContainerStatusUnknown   1          29m
...                                                                       
thanos-compactor-86bffd98f5-845zg        0/1     ContainerStatusUnknown   1          58m                                                                         
thanos-compactor-86bffd98f5-b6b8r        0/1     ContainerStatusUnknown   1          65m                                                                         
thanos-compactor-86bffd98f5-chzhb        1/1     Running                  0          2m                                                                          
...
thanos-compactor-86bffd98f5-jt4tj        0/1     Error                    0          21h                                                                         
thanos-compactor-86bffd98f5-vnlrw        0/1     Error                    0          25m                                                                         
thanos-compactor-86bffd98f5-w9h8w        0/1     ContainerStatusUnknown   1          80m                                                                         
thanos-compactor-86bffd98f5-zcc6v        0/1     ContainerStatusUnknown   1          152m
Warning  Evicted              <invalid>  kubelet            The node was low on resource: ephemeral-storage. Threshold quantity: 1054243241, available: 341120K
i. Container compactor was using 56Ki, request is 0, has larger consumption of ephemeral-storage.
  Normal   Killing              <invalid>  kubelet            Stopping container compactor
  Warning  ExceededGracePeriod  <invalid>  kubelet            Container runtime did not kill the pod within specified grace period.

kube-state-metrics doesn't expose ephemeral volume information and a discussion about how to monitor them has been ongoing since 2021 - looks stale at this point.

https://github.com/jmcgrath207/k8s-ephemeral-storage-metrics that deals with this gap in monitoring

It might also be possible to extend kube-state-metrics and have visibility into the ephemeral volumes usage - **preferable **solution as it wouldn't add an extra component to the monitoring stack

Assignee Loading
Time tracking Loading