Make the pager stop melting if the world is on fire.
Right now, if there's a major outage, the EOC gets paged... a lot.
This is not helpful to debugging issues and it's very stressful.
This was mentioned in gitlab-com/gl-infra/production#19996 (comment 2561813561)
I see two solutions for this, both of which we probably should have done a while ago.
- Actually set up dependencies correctly so that if say, patroni is down, we don't page on anything else.
- Create a potential breakglass that says the world is on fire and we know it, shut up paging for say an hour.
I'm curious on the thoughts from @gitlab-org/production-engineering/observability on the best ways to do this.