Out-of-the-box NGINX alerts for auto-monitoring in Auto DevOps
Problem to solve
We can further reduce configuration required by our users to see value from the Monitor stage by adding out-of-the-box alerts for NGINX metrics that work right away once auto-monitoring is set up for Auto DevOps. This will...
- Reduce time to value to users
- Demonstrate alerting and encourage users to explore alerting
The Monitor stage is quickly maturing and its desirable to lower the barrier to entry for new users. Adding alerts to auto-monitoring in Auto-DevOps removes configuration and shows users how it works.
Intended Users
Sasha the Software Developer
Devon the DevOps Engineer
Sidney the Systems Administrator
Further Details
This work supports the Incident Management Vision
Proposal
For newly created projects, as soon as a user installs Prometheus on their Cluster, two default metrics would have the following alerts applied:
- On the Throughput metric, add an alert to auto-monitoring in Auto DevOps when 5xx status codes ≥ 0.1% for 10 minutes
- On the HTTP error rate metric, add an alert to auto-monitoring in Auto DevOps when 5xx errors ≥ 0.1% for 10 minutes
Additional details
- These alerts should NOT automatically trigger the creation of incidents.
- These behaviors should not apply to existing projects, or existing prometheus installs, only newly created ones. We don't want to inadvertently impact already existing alert settings.
Permissions and Security
Documentation
Documentation required. Please updates this section of the docs: https://docs.gitlab.com/ee/topics/autodevops/stages.html#auto-monitoring