Show runner failure rates in Fleet Dashboard - Groups
Insight
Runner failures were consistently voted as the most important feature on the dashboard when validating the solution in https://gitlab.com/gitlab-org/ux-research/-/issues/2403. Being able to see the trends of runner failures over time would be useful to identify larger problems that cause more blackouts with the runner performance. It would identify that action would need to be taken to fix a problem and how often those failures happen (are they a Friday night thing or an all-the-time thing).
Supporting evidence
So, so it would be very interesting to see the error rates, like because of runners dying in the middle of something, for example, as like the docker engine thing I mentioned earlier that that's, that's very valid because that, that's something like as, as mentioned like we run on on spot instances. So if that be like, if that is becoming a bigger of a problem that runners are like dying all the time too much and it starts to like make people annoyed, that would be very interesting to see at, is there like a trend that, that this is happening more and more? Do we need to kind of do something about like trying to make sure that like, like, or do something about the automatic retry for example. We don't currently, it's not that big of a problem, but when is it big enough of a problem that we would need to spend a little time in, in looking into how could we make the, the, the jobs a bit more like robust for these runner failures that okay. Are happening.
Action
- Add a stat of failure rate across group runners owned by that group
- Include a view details link that takes the user to a full page with a chart of the failure trends as well as a table of failure rates per runner
- Add failure rate to runner details page
Resources
Tasks
-
Assign this issue to the appropriate Product Manager, Product Designer, or UX Researcher. -
Add the appropriate Group
(such as~"group::source code"
) label to the issue. This helps identify and track actionable insights at the group level. -
Link this issue back to the original research issue in the GitLab UX Research project and the Dovetail project. -
Adjust confidentiality of this issue if applicable