feat: page EOC when elastic is close to disk saturation
What
- Bump the disk saturation for elastic to be a paging event.
- Tune the tresholds to be at a level before problems show up.
- Add runbook on how to debug disk space issues, from past experiences.
Why
In gitlab-com/gl-infra/production#7669 (closed) we've had a problem where disk was being saturated, and we didn't realize until we had no Primary, after some debugging we figured out a missconfiguration. If we had an alert on disk space it would make it more clear for the on-call what the problem might be.
Signed-off-by: Steve Azzopardi sazzopardi@gitlab.com
Edited by Steve Xuereb