Further documentation about stage group dashboards
In #665 (closed), we aim to provide basic introductory documentation about the stage group dashboards.
This issue is to extend that documentation with other useful topics. Further items will be added as we communicate with the stage groups.
-
From #665 (comment 471948549) (Done, already included in the introductory documentation)
- Overview and summary of the components inside a dashboard. Some details to pay attention to filters (PROMETHEUS_DS, environment, deploy, canary-deploy, feature-flags), aggregation period, time interval, etc.
- Meanings of each panel/metrics shown in the stage group dashboard. This may be too verbose, but it is helpful, especially for less experienced engineers. For example, when looking at the request rate per action web panel of Monitor group, t's hard to tell what 1 value of ProjectsController#Show actually means, whether it is 1 request per minute or 1 request per 30 seconds, why on the dashboard it's 1.2, not 1, what Request rate per action git means, etc.
- How to use the metrics for debugging?
- Drill down, filters
- Explore more with Promql + the Explore feature of grafana.
- A real example of how to debug a production issue
- How to customize and expand the metrics dashboards
- How did we record and process the metrics? The pulling mechanism of prometheus and metric aggregation may affect user assumptions about accuracy and precision over a time period.
- Further links and documents
- Roadmap and milestones in future
Edited by Quang-Minh Nguyen