Open production issues for SLO alerts for services owned by the Observability SRE Team

With the rollout of Thanos Frontend, we'll have a better cache and proxy in front of the query server. Which, should make what I'm proposing in this issue an even better idea. But, why wait!

Our thanos_query SLO violations generally occur due to specific poor-performing queries: user initiated or Grafana dashboards. Each dip is an opportunity to optimize a dashboard, or reach-out to a team member and offer our assistance. The best place to do this is in an individual issue.

Definition of Done

each thanos_query SLO violation opens an Incident in the production tracker via GitLab's Alertmanager integration
each thanos_query SLO violation alert routes directly to the #sre_observability Slack channel
- this doesn't seem to work. Let's discuss it in https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/12131

Nice To Have

each of the Incident issues is assigned to @sre-observability
- split into https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/12131

Edited Dec 11, 2020 by Craig Furman