Set up process/tools to monitor Monitor group's demo environments
The monitor team now manages a few projects for validating features and demonstrating applications to customers. These projects run over a few different clusters.
It would be useful to set up monitoring for these clusters so that we can be informed when they have issues.
This would serve the dual purpose of allowing us to dogfood our own features, as well as getting some 'light' SRE experience, which is critical as they are one of our primary users.
For a start:
- Set up some basic alerts that trigger when the various clusters stop reporting prometheus data.