Fix a failing Pipeline Flow - Instrumentation & Observability
## **Problem**
We lack the instrumentation to answer two fundamental questions about the Fix Pipeline flow:
1. Is it delivering value to developers?
2. Is it behaving as designed?
Without these metrics, we cannot measure whether the flow is saving developer time or resolving failures effectively.
## **Proposal**
Build a clear, reusable instrumentation layer for the Fix CI/CD Pipeline with Duo flow that enables product, engineering, and data teams to measure adoption, performance, failure patterns, and step-level behavior — with an eye toward reusability across all Duo Workflow flows.
### Flow Impact Metrics
Why: Measure the tangible time savings the feature delivers to developers, providing a direct indicator of business value and ROI to justify continued investment.
<table>
<tr>
<th>Priority</th>
<th>Metric</th>
<th>Target Visualization</th>
<th>Notes</th>
</tr>
<tr>
<td>HIGH</td>
<td>Average time from pipeline failure to green pipeline</td>
<td>Tableau</td>
<td>
* https://gitlab.com/gitlab-data/product-analytics/-/work_items/3211+
</td>
</tr>
<tr>
<td>MEDIUM</td>
<td>Median cost per flow or LLM requests used by the flow</td>
<td>Tableau</td>
<td>
LLM calls - See https://10az.online.tableau.com/#/site/gitlab/views/AgentUsageEngagement/AgentSuccessMetrics?:iid=1
</td>
</tr>
</table>
### Standard Flow Metrics
Why: Understand overall feature adoption and where users drop off to prioritize reliability and UX improvements.
| Priority | Metric | Target Visualization | Notes |
|----------|--------|----------------------|-------|
| High | Number of times Fix Pipeline Flow was triggered | Tableau | Available in Kibana |
| High | Number of times Fix Pipeline Flow completed successfully | Tableau | Available in Kibana |
| High | Number of times Fix Pipeline flow Failed | Tableau | Available in Kibana |
| Medium | Number of flows stopped by user | Tableau | available in Kibana |
| High | Conversion rate: Number of flows that resulted in a fix/Number of flows triggered | Tableau | |
| High | Conversion rate: Number of flows that results in a comment/Number of flows triggered | Tableau | |
| High | Converstion rate: Number of flows that results in an auto-retry/Number of flows triggered | Tableau | |
| Medium | Median duration of the flow | Tableau | |
| Medium | Track commits made by pipeline bot being reverted | | |
### Failure Classification
Why: Categorize why flows fail to prioritize the highest-impact fixes and track whether the LLM is correctly scoping problems it can solve.
<table>
<tr>
<th>Priority</th>
<th>Metric</th>
<th>Visualization</th>
<th>Notes</th>
</tr>
<tr>
<td>Medium</td>
<td>Failure Reason/Category - Can we log this information based on the LLM reasoning</td>
<td>Tableau</td>
<td></td>
</tr>
<tr>
<td>Medium</td>
<td>
Commonly suggested fix
* Is it to retry the job, push a MR out, change the ci config
</td>
<td>Tableau</td>
<td></td>
</tr>
</table>
### Flow Step Level Metrics (to implement the above)
Why: Understand how the flow executes internally to identify bottlenecks, unexpected paths, and opportunities to improve the flow's decision-making.
| Priority | Metric | Visualization | Notes |
|----------|--------|---------------|-------|
| Low | Step Duration | Kibana | |
| Low | Step Status | Kibana | |
| Medium | Step Failure Reason | Kibana | |
### **Filtering / Segmentation Dimensions**
* GitLab project
* Trigger type (DAP automation vs. manual)
* Pipeline Source Type (Merge Request, Scheduled, Push etc.)
epic