Stage group error budget exploration dashboard
With the work from &525 (closed) we now have more fine grained information in metrics for stage groups to explore where their error budget is being spent without having to navigate to the logs as often. The most detail will still be in the logs, but we could already show which endpoints are contributing the most.
I think we could create a dashboard per stage group that looks very similar to what a service overview dashboard looks like (For example, for web.
So we could have a first row that is the summary (what we currently show on the stage group dashboard). While below, we show things in the way we show them on service overviews:
-
Aggregated SLIs: separated error-, and apdex ratios, with MWMBR thresholds based on the 99.95% SLO - gitlab-com/runbooks!4225 (merged) -
One row per SLI: separated error-, and apdex ratios, with the same thresholds - gitlab-com/runbooks!4099 (merged) -
One (collapsed by default) row per SLI that shows a breakdown of that SLI per "significant label". Significant labels would mean a breakdown per service, endpoint, worker, etc - gitlab-com/runbooks!4129 (merged)
Other tasks:
-
Allow Sidekiq SLIs to be present - #1313 (closed) -
Split these into separate dashboards per stage group (gitlab-com/runbooks!4246 (merged)), allowing: - Links to and from the main stage group dashboards
- Inclusion of SLIs with static categories
- Working kibana links
Edited by Sean McGivern