Skip to content
GitLab
Next
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • scalability scalability
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 270
    • Issues 270
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 1
    • Merge requests 1
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Code review
    • Insights
    • Issue
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.comGitLab.com
  • GitLab Infrastructure TeamGitLab Infrastructure Team
  • scalabilityscalability
  • Issues
  • #1365
You need to sign in or sign up before continuing.
Closed
Open
Issue created Oct 22, 2021 by Bob Van Landuyt@reprazentMaintainer4 of 5 checklist items completed4/5 checklist items

Stage group error budget exploration dashboard

With the work from &525 (closed) we now have more fine grained information in metrics for stage groups to explore where their error budget is being spent without having to navigate to the logs as often. The most detail will still be in the logs, but we could already show which endpoints are contributing the most.

I think we could create a dashboard per stage group that looks very similar to what a service overview dashboard looks like (For example, for web.

So we could have a first row that is the summary (what we currently show on the stage group dashboard). While below, we show things in the way we show them on service overviews:

  • Aggregated SLIs: separated error-, and apdex ratios, with MWMBR thresholds based on the 99.95% SLO - gitlab-com/runbooks!4225 (merged)
  • One row per SLI: separated error-, and apdex ratios, with the same thresholds - gitlab-com/runbooks!4099 (merged)
  • One (collapsed by default) row per SLI that shows a breakdown of that SLI per "significant label". Significant labels would mean a breakdown per service, endpoint, worker, etc - gitlab-com/runbooks!4129 (merged)

Other tasks:

  • Allow Sidekiq SLIs to be present - #1313 (closed)
  • Split these into separate dashboards per stage group (gitlab-com/runbooks!4246 (merged)), allowing:
    • Links to and from the main stage group dashboards
    • Inclusion of SLIs with static categories
    • Working kibana links
Edited Jan 31, 2022 by Sean McGivern
Assignee
Assign to
Time tracking