Error Budget MVC
Problem to solve
Organizations need a way to balance stability and velocity. A common way to try to achieve this, is to utilize Error Budgets. These allow organizations to set an agreed upon toleration for errors, and track any outages against it. This way they understand if they are over or undershooting on stability.
GitLab should provide tools to help establish and track an error budget.
The simplest form of tracking an error budget is to simply track the time of an outage. This is fairly naive, as it doesn't account for severity, but this could be added later as a multiple of sorts. This allows us to still track error rates when we may not be able to know or track the error rate over the set of requests.
To track the incident duration, we can utilize the beginnings of our incident management feature set. We are now able to automatically open issues from alerts, and we can build on this to also track the duration by logging when an alert has cleared.
To achieve this we should:
- Comment in the issue when an alert has cleared.
- Calculate the time the issue was open, and utilize time tracking to store the value.
While we aren't doing any global reporting right now, you can retrieve this field via the API and also export via CSV (https://docs.gitlab.com/ee/user/project/issues/csv_export.html). This can allow for manual reports to be built while we build larger reporting capabilities.
Permissions and Security
Parker, Product Manager, https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas#parker-product-manager
Delaney, Development Team Lead, https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas#delaney-development-team-lead
Sasha, Software Developer, https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas#sasha-software-developer
Sidney, Systems Administrator, https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas#sidney-systems-administrator