Fix a failing Pipeline Flow - Instrumentation & Observability
## **Problem** We lack the instrumentation to answer two fundamental questions about the Fix Pipeline flow: 1. Is it delivering value to developers? 2. Is it behaving as designed? Without these metrics, we cannot measure whether the flow is saving developer time or resolving failures effectively. ## **Proposal** Build a clear, reusable instrumentation layer for the Fix CI/CD Pipeline with Duo flow that enables product, engineering, and data teams to measure adoption, performance, failure patterns, and step-level behavior — with an eye toward reusability across all Duo Workflow flows. ### Flow Impact Metrics Why: Measure the tangible time savings the feature delivers to developers, providing a direct indicator of business value and ROI to justify continued investment. <table> <tr> <th>Priority</th> <th>Metric</th> <th>Target Visualization</th> <th>Notes</th> </tr> <tr> <td>HIGH</td> <td>Average time from pipeline failure to green pipeline</td> <td>Tableau</td> <td> * https://gitlab.com/gitlab-data/product-analytics/-/work_items/3211+ </td> </tr> <tr> <td>MEDIUM</td> <td>Median cost per flow or LLM requests used by the flow</td> <td>Tableau</td> <td> LLM calls - See https://10az.online.tableau.com/#/site/gitlab/views/AgentUsageEngagement/AgentSuccessMetrics?:iid=1 </td> </tr> </table> ### Standard Flow Metrics Why: Understand overall feature adoption and where users drop off to prioritize reliability and UX improvements. | Priority | Metric | Target Visualization | Notes | |----------|--------|----------------------|-------| | High | Number of times Fix Pipeline Flow was triggered | Tableau | Available in Kibana | | High | Number of times Fix Pipeline Flow completed successfully | Tableau | Available in Kibana | | High | Number of times Fix Pipeline flow Failed | Tableau | Available in Kibana | | Medium | Number of flows stopped by user | Tableau | available in Kibana | | High | Conversion rate: Number of flows that resulted in a fix/Number of flows triggered | Tableau | | | High | Conversion rate: Number of flows that results in a comment/Number of flows triggered | Tableau | | | High | Converstion rate: Number of flows that results in an auto-retry/Number of flows triggered | Tableau | | | Medium | Median duration of the flow | Tableau | | | Medium | Track commits made by pipeline bot being reverted | | | ### Failure Classification Why: Categorize why flows fail to prioritize the highest-impact fixes and track whether the LLM is correctly scoping problems it can solve. <table> <tr> <th>Priority</th> <th>Metric</th> <th>Visualization</th> <th>Notes</th> </tr> <tr> <td>Medium</td> <td>Failure Reason/Category - Can we log this information based on the LLM reasoning</td> <td>Tableau</td> <td></td> </tr> <tr> <td>Medium</td> <td> Commonly suggested fix * Is it to retry the job, push a MR out, change the ci config </td> <td>Tableau</td> <td></td> </tr> </table> ### Flow Step Level Metrics (to implement the above) Why: Understand how the flow executes internally to identify bottlenecks, unexpected paths, and opportunities to improve the flow's decision-making. | Priority | Metric | Visualization | Notes | |----------|--------|---------------|-------| | Low | Step Duration | Kibana | | | Low | Step Status | Kibana | | | Medium | Step Failure Reason | Kibana | | ### **Filtering / Segmentation Dimensions** * GitLab project * Trigger type (DAP automation vs. manual) * Pipeline Source Type (Merge Request, Scheduled, Push etc.)
epic