Skip to content

feat: page EOC when elastic is close to disk saturation

Steve Xuereb requested to merge feat/alert-es-disk-saturation into master

What

  • Bump the disk saturation for elastic to be a paging event.
  • Tune the tresholds to be at a level before problems show up.
  • Add runbook on how to debug disk space issues, from past experiences.

Why

In gitlab-com/gl-infra/production#7669 (closed) we've had a problem where disk was being saturated, and we didn't realize until we had no Primary, after some debugging we figured out a missconfiguration. If we had an alert on disk space it would make it more clear for the on-call what the problem might be.

Signed-off-by: Steve Azzopardi sazzopardi@gitlab.com

Edited by Steve Xuereb

Merge request reports