Proposal - History of master broken/review-apps deploy errors per month
Motivation
If you asked me one of those two questions, I would struggle to answer:
- “What are the top 5 issues for master stability at GitLab?“,
- “What are the top 5 issues for review app deployments at GitLab?“,
The purpose of this proposal is to allow anybody to be able to answer those questions in a self-service manner.
(It's an application of the "see it and find it" principle of our areas of responsibility)
The proposal
Gather all of the "master broken"/"review apps deploy" problems we see per month (e.g. in generated issue, one issue for master stability, and another one for review app deployments), group them by root cause to understand what we could work on to prevent them (manual process at first). Then, order those problems per root cause in descending order (i.e. most frequent problem first) with a "comments" column.
We could then review them async monthly (the entire team or a single person), and create issues for it.
Pros
- Systematic approach
- Spreads knowledge about what needs to be worked on first to improve master stability
- Data-driven justification for how we try to improve two of our core metrics (master-pipeline-stability and review-app-deployment-success-rate).
Cons
- Manual, and possibly tedious process
Other ideas
Possible inspiration for the layout: https://app.periscopedata.com/shared/9e320ebb-43f6-4b30-9f72-7f02636ae410? for the layout.