Skip to content

Adjust alerting/accounting for multiple servers

John Skarbek requested to merge jts/file-space-alerting into master
  • It was infuriating to receive an alert to wake me up telling me that one server of an entire fleet ran out of disk space.
  • I don't care if 1 server of 14 ran out of space. If that one failed because of it, there are 13 others to take the slack
  • This modifies our existing disk alert such that we don't get paged
    • Instead it'll go to a slack channel where there still visibility, but not an intrusive, 'wake me up of this,' situation
  • We also create an alert that takes into account that we mostly run multiple servers for a type and tier.
    • When more than 50% of the servers in any given tier and type are below a threshold of 10% disk space left, on any mountpoint, that's when we should be nervous, and that's when we should get paged

Signed-off-by: John T Skarbek jtslear@gmail.com

Merge request reports