Job success, failure, timing metrics
Problem to solve
Being able to see history for success and failure of pipelines in CI/CD Charts is great at a macro level to spot potential issues in growing pipeline failures but really only scratches the surface. Users cannot dig into a specific pipeline or job to see how often it is failing and the time it takes to run successfully or fail over time.
If users could get at the primitives of success/failure and timing of jobs they could speed up their pipelines and increase their likelihood of successful deploys.
Intended users
Further details
Rachel and Delaney have a problem today because teams are getting green pipelines and merging code but tests automated integration tests are failing or manual tests are finding issues that the team should have found earlier. This results in releasing a feature that is turned off, releasing a feature with a bug or delaying a release to fix and revalidate.
If Rachel and Delany could get the data to failure data to review they can help the teams build in a faster feedback loop with additional testing.
Some use cases we'll need to handle/account for are:
- what happens to historical data when a pipeline is changed?
- What happens to historical data when a pipeline is deleted?
- What happens when data collection is at the extreme?
- How long should data persist?
Proposal
Capture some timeframe of job and pipeline data for start time, end time and state that can be downloaded by users and eventually added to the pipeline > Charts page to provide in app research capabilities.