Monitoring enabled by default for all large customers (Prometheus, Grafana and all exporters)
## Overview **DRI** - @kencjohnston This epic is part of the work for the [Self-managed Scalability Working Group](https://about.gitlab.com/company/team/structure/working-groups/self-managed-scalability/) ## Description We struggle to support large self-managed customers as they undergo scalability challenges. This effort encompasses adding self-monitoring with Prometheus and Grafana as a default for large customers. Adding it as a default for future deployments by large customers is not sufficient to alleviate our ongoing pain. We must also have a plan to retrofit existing customers to enable the service as part of a [migration strategy](https://gitlab.com/groups/gitlab-org/-/epics/1340). ## Success * GitLab defaults to enabling Prometheus and Grafana for any new installations planning to support over 10k+ users * Plans available for facilitating the configuration changes to existing 10k+ user instances to enable Prometheus and Graphana * Delayed Goal - 100% of customers supporting over 10k users have Prometheus and Grafana enabled by default (after [migration strategy](https://gitlab.com/groups/gitlab-org/-/epics/1340) complete) ## Tasks * [x] Flesh out Epic with initial set of issues (product and process) * [x] Update description with more detailed plan Questions: - How will we configure Prometheus automatically? Currently it's a manual task in HA/scaled environments. This is related to service discovery. ## Delivered For new large customers... * All new Professional Service engagements now include a Prometheus & Grafana node as part of every installation engagement. * Every TAM has access to the Prom&Grafana playbook and is encouraging new customers to setup monitoring as part of their architecture recommendations. * GitLab Omnibus now includes features to make setting this up a lot easier (self-discovery). For existing Top 25 largest customers... https://docs.google.com/document/d/15VbfvUABx2OTjsHaL0-QQ5FPNoxj57JTspsQfSTTdqQ/edit * Every TAM has access to the Prom&Grafana playbook and has been in contact with their customers re: their monitoring usage, needs and adoption.
epic