Skip to content

feat: web-pages dependsOn api

Steve Xuereb requested to merge feat/web-pages-depend-on-api into master

What

Don't page on web-pages when the api service is firing alerts already.

This creates the following inhibit rules:

inhibit rules
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="loadbalancer"
  - type="api"
  target_matchers:
  - component="loadbalancer"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="nginx_ingress"
  - type="api"
  target_matchers:
  - component="loadbalancer"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="workhorse"
  - type="api"
  target_matchers:
  - component="loadbalancer"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="rails_requests"
  - type="api"
  target_matchers:
  - component="loadbalancer"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="loadbalancer"
  - type="api"
  target_matchers:
  - component="loadbalancer_https"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="nginx_ingress"
  - type="api"
  target_matchers:
  - component="loadbalancer_https"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="workhorse"
  - type="api"
  target_matchers:
  - component="loadbalancer_https"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="rails_requests"
  - type="api"
  target_matchers:
  - component="loadbalancer_https"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="loadbalancer"
  - type="api"
  target_matchers:
  - component="server"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="nginx_ingress"
  - type="api"
  target_matchers:
  - component="server"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="workhorse"
  - type="api"
  target_matchers:
  - component="server"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="rails_requests"
  - type="api"
  target_matchers:
  - component="server"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="loadbalancer"
  - type="api"
  target_matchers:
  - component="server_headers"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="nginx_ingress"
  - type="api"
  target_matchers:
  - component="server_headers"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="workhorse"
  - type="api"
  target_matchers:
  - component="server_headers"
  - type="web-pages"
- equal:
  - env
  - environment
  - pager
  source_matchers:
  - component="rails_requests"
  - type="api"
  target_matchers:
  - component="server_headers"
  - type="web-pages"

Why

On 2022-03-31 we've seen service degradation on the api service and as a result web-pages was also violating the SLO. This is because web-pages depends heavily on api for domain information and any other data retrival that lives in the monolith.

In https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16030#note_1024703191 we have validated the service chains work as expected. The chain we have here is patroni -> api -> web-pages. If patroni is firing an alert neither api or web-pages will fire an alert if they are violating SLOs.

Reference: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16030

Merge request reports