Improvements for reporting on error budgets in the engineering allocation weekly

During the weekly Infradev & Engineering Allocation sync call we report briefly on error budget.

During the call the following hurdles to the process were raised:

1. The 28d rolling average makes it hard to see weekly how the results defer and what has changed

Proposal: In #1256 (closed) (part of &664) we're working on using the range selected in Grafana to show the availability numbers for that selected range. This would allow anyone using that dashboard to see availability score's for any range selected since the recordings have started.

As a summary to use during the weekly Infradev & Engineering Allocation we can update the Infradev Status Report issues to show the past 7 days next to 28 days for the availability column:

image

The 28 day number exported in error budget reports and Sisense remains the "official" number, but for weekly progress we can use the 7 day one.

2. It is hard to see which issues have been created to address or investigate these numbers for certain stage groups

For this @andrewn suggested a similar process as we do for infradev issues. We could ask the owning groups to label issues related to work that would improve their error budget score with a specific label. For example: Error Budget Improvement

We could then add a column to the infradev status report, counting and linking to the open issues with this label. This would mean that a group that has an , but issues linked has performed investigations and is planning work on improvements. While groups with , but no issues would need to investigate what happened the past week.

image

Edited by Bob Van Landuyt