Displaying error budget for a stage group
For this issue, we want to show the error budgets to the stage group in the following form:
4 panels in a row called Error Budgets (past 28 days)
:
- Availability: Percentage of overall success ratio
- Minutes spent: The number of minutes spent from ~20 minutes available
- Minutes remaining: The number of minutes remaining from ~20 minutes available (based on the 99.95% availability target)
- Info panel with links to the handbook and the calculation.
The simplified calculation is:
Availability
the number of operations with a satisfactory apdex + the number of operations without errors
/
the total number of apdex measurements + the total number of operations
This calculates the error budget as a percentage, and we would use values from the previous 28 days in the calculation.
Minutes remaining
We could calculate this in seconds remaining as follows:
((1 - <target>) * 28 * 86400) - ((1 - <availability>) * 28 * 86400)
Grafana will then turn this into minutes/hours as appropriate.
This would result in the following results for the different availabilities with a target of 99.95%
Availability | Minutes Spent | Minutes Remaining |
---|---|---|
100% | 0 | 20 |
99.97% | 12 | 8 |
99.95% | 20 | 0 |
99.90% | 40 | -20 |
The purpose of the error budget is to inform prioritization decisions and to drive action.
Edited by Rachel Nienaber