feat: have web inhibit alerts from patroni
feat: have web inhibit alerts from patroni
What
When the patroni service is violating SLO, stop the web service from
alerting. To look at the generated configuration run
./alertmanager/generate.sh which will add the following:
generated inhibit_rules
inhibit_rules:
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_primary_sql"
- type="patroni"
target_matchers:
- component="loadbalancer"
- type="web"
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_replica_sql"
- type="patroni"
target_matchers:
- component="loadbalancer"
- type="web"
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_primary_sql"
- type="patroni"
target_matchers:
- component="puma"
- type="web"
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_replica_sql"
- type="patroni"
target_matchers:
- component="puma"
- type="web"
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_primary_sql"
- type="patroni"
target_matchers:
- component="rails_requests"
- type="web"
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_replica_sql"
- type="patroni"
target_matchers:
- component="rails_requests"
- type="web"
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_primary_sql"
- type="patroni"
target_matchers:
- component="workhorse"
- type="web"
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_replica_sql"
- type="patroni"
target_matchers:
- component="workhorse"
- type="web"
Alerts that we are effecting and will need to monitor after this is rolled out
Why
When patroni goes down everything else goes down, instead of firing
multiple alerts that might be distracting for the on-call only fire one,
the important one (patroni).
Reference: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15766