Update to measure Mean Time to First Failure

https://about.gitlab.com/handbook/engineering/quality/performance-indicators/#gitlab-project-merge-request-pipeline-average-time-to-failure

From our recent Key Review https://docs.google.com/document/d/1xbXfSwHrJZvtJIpj0t45IkWy_sEzEXbDj_56dSSJTQc/edit#bookmark=id.7emgak9n7qlv

Mek: In video, but maybe not in written agenda. Have average time to failure, improved in April, uptick in May. Given data lag, believe will smooth out.

Kyle: This is full pipeline duration, not time to first failure. https://about.gitlab.com/handbook/engineering/quality/performance-indicators/#gitlab-project-merge-request-pipeline-average-time-to-failure

Sid: I don’t understand, it’s not called pipeline duration right now?

Kyle: How long it takes for a pipeline to finish given it fails.

Sid: As a developer, I’d like to see a shorter time to first failure, so I know where to focus.

Kyle: That’s something we can look at more closely. When we experimented we got feedback that people want the full set of feedback rather than short circuiting everything else that runs.

Sid: Not proposing stopping the rest either. The idea of this is I want to know if there is a problem and that it’s not green. The way we will improve this is by scheduling tests most likely to fail early. Run the ones that failed last time first. It will help bring time down and help developers. After that keep running and give all the failures. It’s really average time to first failure that I’m after to improve the feedback cycle for the developer.

Mek: we will rename to average time to first failure

Task

Measure the time to first failure
Rename KPI to Average time to first failure
Update the KPI description to remove link to this issue

Edited Jun 09, 2021 by Mek Stittri