Cluster Monitoring Alerts
We should provide alerts to Owners and Masters of the project when their cluster is unhealthy. We can detect this based on the metrics we are already monitoring, such as running out of CPU/Memory or other resources.
While we can allow these to be customized, we should default these to on since it is relatively easy to have decent values. For example, we can simply alarm when total CPU/Memory/Pods reaches 90% utilization.