Adjust alerting/accounting for multiple servers
- It was infuriating to receive an alert to wake me up telling me that one server of an entire fleet ran out of disk space.
- I don't care if 1 server of 14 ran out of space. If that one failed because of it, there are 13 others to take the slack
- This modifies our existing disk alert such that we don't get
paged
- Instead it'll go to a slack channel where there still visibility, but not an intrusive, 'wake me up of this,' situation
- We also create an alert that takes into account that we mostly run
multiple servers for a type and tier.
- When more than 50% of the servers in any given tier and type are below a threshold of 10% disk space left, on any mountpoint, that's when we should be nervous, and that's when we should get paged
Signed-off-by: John T Skarbek jtslear@gmail.com