feat: have web inhibit alerts from patroni
feat: have web inhibit alerts from patroni
What
When the patroni
service is violating SLO, stop the web
service from
alerting. To look at the generated configuration run
./alertmanager/generate.sh
which will add the following:
generated inhibit_rules
inhibit_rules:
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_primary_sql"
- type="patroni"
target_matchers:
- component="loadbalancer"
- type="web"
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_replica_sql"
- type="patroni"
target_matchers:
- component="loadbalancer"
- type="web"
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_primary_sql"
- type="patroni"
target_matchers:
- component="puma"
- type="web"
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_replica_sql"
- type="patroni"
target_matchers:
- component="puma"
- type="web"
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_primary_sql"
- type="patroni"
target_matchers:
- component="rails_requests"
- type="web"
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_replica_sql"
- type="patroni"
target_matchers:
- component="rails_requests"
- type="web"
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_primary_sql"
- type="patroni"
target_matchers:
- component="workhorse"
- type="web"
- equal:
- env
- environment
- pager
source_matchers:
- component="rails_replica_sql"
- type="patroni"
target_matchers:
- component="workhorse"
- type="web"
Alerts that we are effecting and will need to monitor after this is rolled out
Why
When patroni
goes down everything else goes down, instead of firing
multiple alerts that might be distracting for the on-call only fire one,
the important one (patroni).
Reference: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15766
Edited by Steve Xuereb