GitLab.com Infra Dogfooding of Metrics and Incidents
Exploration issue of what it would take for GitLab.com's Infra team to start dogfooding metrics.
Our current capabilities are documented [here](https://docs.gitlab.com/ee/user/project/integrations/prometheus.html).
## Intent
The intent is not to use GitLab.com as a representative sample of our customers size and complexity, but to [put the system thru its paces and generate rapid feedback on direction](https://about.gitlab.com/handbook/product/#dogfood-everything).
## Tasks
* [x] Identify stakeholders
* [x] Ensure common understanding of intent
* [x] Understand current pain points, rationale for not using
* [x] Discuss possible initial steps (project with single metric with GitLab dashboard)
* [x] Populate this epic with actions
## Actions
**design.gitlab.com**
* [x] Investigate [design.gitlab.com's project](https://gitlab.com/gitlab-org/gitlab-services/design.gitlab.com/environments/269942/metrics) to see if [dashboards](https://gitlab.com/gitlab-org/gitlab-services/design.gitlab.com/environments/269942/metrics) and alarms are setup - @kencjohnston
* [ ] If not setup alarms to create incidents
**versions.gitlab.com**
* [ ] Move to ADO
* [ ] Configure metrics and alerts
* [ ] Create incidents from alerts
**license.gitlab.com**
* [ ] Move to ADO
* [ ] Configure metrics and alerts
* [ ] Create incidents from alerts
**Other**
* [ ] Educate SRE/Infra team on Ops capabilities and vision - @kencjohnston
* [ ] Create issue for better display of alarm threshold time windows (like DataDogs) - @kencjohnston
* [ ] Create issue - Allow for source controlled alertmanager config to easily be updated on prometheus cluster during deploy - @kencjohnston
Key questions in threads below.
epic