Autogenerate routing tree based on feature categories

Introduction

There is a requirement for teams to receive service level alerts for the services that they are responsible for.

For example:

  • An EM, @zj-gitlab, approached me saying "yesterday there was an incident involving one of our services. The Gitaly team would have liked to help out, but were unaware of the incident".
  • The Pages team would live to be notified of pages service service-level degradations: #344 (closed)

Problem

Currently, our alert routing tree is quite adhoc. Additionally, its difficult to update to route alerts to a team.

Proposal

Using the feature_category attribute, and the service catalog, allow alerts that have the appropriate feature_category label to be routed to engineering teams, so that they can be aware of ongoing incidents.

Benefit/Impact

  1. More self-service capability by teams to receive alerts they're interested in (MRs on the runbooks repo, presumably easily determined)
  2. Broader understanding of the operational system by dev teams

Tasks

TBD

Edited by Craig Miskell