Review Request - Error Budgets : Package group error budget review
Context
While evaluating the Package Group error budget dashboard and comparing with the General Metrics/SLAs, it seemed that our error budget was being affected by our rails product category work as the Container Registry has been running at 100% for the last 28 days.
In attempting to evaluate in the Error Budget dashboard, not only were both (API/Web) error rate charts empty, but I was not clear on how to identify where we were seeing errors and how to track them down. I also notice that 0ms
is red in the General SLA dashboard, should it be green?
Error Budget dashboard request
There are a few concerns I'd like addressed:
- The API Error Rate and WEB Error Rate charts both show
no data
is this expected? - How should we use the Error Budget dashboard to identify areas that need to be addressed and prioritised?
- For groups with a separate service (groupconfigure with K8s agent, ~"group::package" with ServiceContainer Registry, grouprelease for example with ServicePages) beyond the GitLab codebase, how can those teams better understand what's contributing to their error budget usage?
Thanks in advance for your consideration/help.
References
/cc @sgoldstein @marin @nicolewilliams @nicholasklick @trizzi
Edited by Rachel Nienaber