Pipeline Optimization & AI Agent Design Exploration
## Background In September 2023, a[ foundational study ](https://docs.google.com/presentation/d/1XL0pm_akNinM9eZd0Q1MCTG92bCHHAGCll5LwAVevoU/edit?slide=id.g27cf391daa8_0_1247#slide=id.g27cf391daa8_0_1247)was conducted exploring how AI could help users optimise cost and performance trade-offs in runner allocation. Across 9 interviews and 4 industry verticals, the study captured [\~48 questions t](https://docs.google.com/presentation/d/1XL0pm_akNinM9eZd0Q1MCTG92bCHHAGCll5LwAVevoU/edit?slide=id.g27cf391daa8_0_1247#slide=id.g27cf391daa8_0_1247)hat users commonly ask when trying to understand and improve pipeline performance. Six key findings emerged, including that users prefer inline, contextual guidance over chat-based interaction, that speed matters more than cost, and that most users want a "set-and-forget" experience once pipelines are configured. Since then, our team has been exploring how AI agents could improve CI/CD workflows, both through the [Group CI/CD Analytics dashboard](https://group-ci-cd-analytics-dashboard-prototype-936033.gitlab.io/) (which gives platform teams visibility into pipeline health across projects) and through a broader design vision for agent-led experiences in the Verify stage. This work has naturally led to deeper questions around pipeline optimisation: who needs it most, what the biggest problems are, and whether a specialised AI agent is needed beyond what Duo Chat can already do. This issue consolidates what we've learned across multiple research sources to answer those questions and guide next steps. ## Interactive prototype The prototype below contains the full research prioritisation, five agentic concept explorations, and all supporting links in one place: [Pipeline Optimization Agentic AI Prototype](https://claude.ai/artifacts/latest/4b44f756-b448-4c2f-87a3-122a92cb23e9) ![Screenshot 2026-04-24 at 14.17.56.png](/uploads/36dcc007649b306977ba99e30e138eec/Screenshot_2026-04-24_at_14.17.56.png){width="475" height="600"} ## What we've done so far **Research and validation** As part of an ongoing project on the Group CI/CD Analytics dashboard, we ran [solution validation sessions](https://docs.google.com/presentation/d/1N7GCdyusFmbxmirOqsIIAibfPrpU1zanCtId4-4KUFs/edit?slide=id.g3aaebe04ec3_0_192#slide=id.g3aaebe04ec3_0_192) (5 moderated, 46 survey respondents) that surfaced some useful pipeline optimisation signals worth noting here: job duration was the number one requested metric, the Jobs tab scored a SEQ of 4.60, and Duo Chat was positively received for diagnosis and contextual analysis. Two enterprise customer signals also came through during this work, one requesting an AI agent that reviews pipelines for inefficiencies, and another where Duo was used to diagnose a cache policy change that added 8 minutes to every build. **Prioritisation** From the original \~48 research questions, we selected 10 and scored them across three dimensions — user pain, frequency, and feasibility — using evidence from the 2023 study's six key findings, solution validation sessions, customer signals, the USAT+ FY27 Q1 survey (762 respondents, 6 themes identified around pipeline and runner performance), and a pipeline failure classification analysis (2,000 failing job logs, LLM-classified). These 10 questions were organised into three tiers and mapped to five agentic concepts, ranging from contextual Duo Chat (lowest autonomy) to autonomous agent activity feeds (highest autonomy). Runner reliability and silent failures ranked as the highest severity signal across the data, and CI config errors (24.8%) and lint failures (14.3%) emerged as the top failure categories — together accounting for \~39% of all pipeline failures. Full detail is in the Prioritisation tab of the prototype. ## Research sources | Source | Date | Sample | Key contribution | |--------|------|--------|------------------| | [Foundational study — AI for runner cost and performance optimisation](https://docs.google.com/presentation/d/1XL0pm_akNinM9eZd0Q1MCTG92bCHHAGCll5LwAVevoU/edit?slide=id.g27cf391daa8_0_0#slide=id.g27cf391daa8_0_0) | Sept 2023 | 9 interviews, 4 verticals | \~48 research questions, 6 key findings | | [Solution validation — Group CI/CD Analytics](https://docs.google.com/presentation/d/1N7GCdyusFmbxmirOqsIIAibfPrpU1zanCtId4-4KUFs/edit?slide=id.g3aaebe04ec3_0_192#slide=id.g3aaebe04ec3_0_192) | 2025–2026 | 5 moderated sessions, 46 survey respondents | Jobs tab SEQ 4.60, Duo Chat validated, top metrics identified | | [Enterprise customer signals](https://gitlab.com/groups/gitlab-org/core-devops/planning/-/work_items/148#note_3220648609) | Ongoing | 2 customers | Pipeline review agent request, cache regression diagnosis | | [USAT+ FY27 Q1 survey](https://claude.ai/artifacts/latest/ab206b29-f466-4b38-ba58-f0d5df552b2d) | March 2026 | 762 respondents, 6 themes | Runner reliability highest signal. Confirms Tier 1 priorities | | Pipeline failure classification [(Slack link)](https://gitlab.slack.com/archives/C0ACS5RCN0N/p1776722402201779) | April 2026 | 2,000 failing job logs | CI config errors 24.8%, lint failures 14.3%, test failures 12.4% | ## What we know **Who is the primary persona?** DevOps Platform Engineers and Engineering Managers responsible for pipeline health across multiple projects in a group. They need a group-level view, not individual pipeline inspection. They're the ones asking "where are things degrading and why?" across their entire organisation. **What is the biggest problem?** Job-level performance visibility, specifically, "what's the time for each job and which are the slowest?" This scored highest across all three dimensions (user pain 5/5, frequency 5/5, feasibility 5/5) and was confirmed by the USAT+ pipeline speed theme. The second strongest signal is regression detection — "what config change caused this pipeline to slow down?", which maps directly to customer evidence and the USAT+ intermittent failures theme. Recommended sequencing: performance visibility first, then regression detection, then cost optimisation. Speed matters more than cost to users. **Why a specialised agent, not just Duo Chat?** Duo Chat handles reactive, conversational queries well — users can ask it questions and get useful answers. But it requires users to know what to ask. The core insight from our research is that users can see the data but struggle to interpret what it means and what to do next. A specialised agent would be proactive and contextual, reading the current dashboard state and surfacing insights, annotations, and suggested actions without the user having to initiate. The higher-autonomy concepts go further: the agent acts between user visits (auto-retrying failed jobs, flagging degradation) and the dashboard shows what happened. ## What we don't know * **Prevention vs. detection** — \~39% of failures (CI config + lint) might be catchable before the pipeline runs. How much is addressable by an AI agent vs. editor tooling vs. pipeline authoring improvements? Needs engineering input. * **Self-managed vs. SaaS differences** — Runner configuration complexity affects both deployment types but for different reasons. Do agent behaviours need to differ? * **Agent trust and autonomy thresholds** — Where is the comfort line between "suggest and wait" and "act autonomously"? Our concepts span the spectrum but this hasn't been validated with users yet. * **Flaky test classification** — Estimated at 7–8% of all failures but hard to detect from job logs alone. Needs historical data. Connected to the Pipeline Agents FY27 epic.
issue