feat: stop paging on hpa saturation
What
Stop paging the on-call engineer on HPA saturation.
Why
This has paged 30 times in the last 3 months and is sometimes it's not actionable or as stated earlier just a symptom where we are already working on the problem because another SLO was fired.
Following the same methodology from My Philosophy on Alerting we shouldn't alert on cause-based alerts, especially if there there is no user impact. Similar to how we no longer alert on high CPU usage if the requests we serve to our users are within the specified SLO.
We can add capacity planning in tamland if we want to get forecasting for HPA saturation.
Reference: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15883
Edited by Steve Xuereb