Skip to content

feat: have web inhibit alerts from patroni

Steve Xuereb requested to merge feat/inhibit-web-alerts-from-patroni into master

feat: have web inhibit alerts from patroni

What

When the patroni service is violating SLO, stop the web service from alerting. To look at the generated configuration run ./alertmanager/generate.sh which will add the following:

generated inhibit_rules
inhibit_rules:
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="rails_primary_sql"
  - type="patroni"
  target_matchers:
  - component="loadbalancer"
  - type="web"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="rails_replica_sql"
  - type="patroni"
  target_matchers:
  - component="loadbalancer"
  - type="web"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="rails_primary_sql"
  - type="patroni"
  target_matchers:
  - component="puma"
  - type="web"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="rails_replica_sql"
  - type="patroni"
  target_matchers:
  - component="puma"
  - type="web"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="rails_primary_sql"
  - type="patroni"
  target_matchers:
  - component="rails_requests"
  - type="web"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="rails_replica_sql"
  - type="patroni"
  target_matchers:
  - component="rails_requests"
  - type="web"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="rails_primary_sql"
  - type="patroni"
  target_matchers:
  - component="workhorse"
  - type="web"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="rails_replica_sql"
  - type="patroni"
  target_matchers:
  - component="workhorse"
  - type="web"

Alerts that we are effecting and will need to monitor after this is rolled out

Why

When patroni goes down everything else goes down, instead of firing multiple alerts that might be distracting for the on-call only fire one, the important one (patroni).

Reference: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15766

Edited by Steve Xuereb

Merge request reports