Add alerting for runners cache machines

runners-cache-2 filled up to 96% disk space, at which point all PUT requests to the runners cache resulted in 500 errors.

We need to:

  1. Add alerting for disk space
  2. Add alerting for unusually high number of 50x errors
  3. Consider automatic recovery mechanism (e.g. clear cache)
Assignee Loading
Time tracking Loading