Add monitoring and alerting for GitHost.io
We have Slack alerting for GitHost.io, but it's not being used effectively. https://gitlab.com/gitlab-com/support/issues/474 discusses what we need to do to improve GitHost, but with @dblessing out this has been delayed.
We really need to have the most basic alerting to do the following:
- Ensure that a host responds with 200 OK
- Pages a support person if it does not respond with X minutes
- Monitor for disk space: send an e-mail (preferably to the customer too) if it nears full usage, and page a support person when it comes critical
I think we're close to having this, but we need someone to focus on this ASAP.
I know @ahanselka was involved with this effort earlier with Prometheus. We'll need some reinforcements while the support team is shorthanded next week.
/cc: @lbot, @pcarranza, @ernstvn