Explore agent traces in merge requests to help review AI-assisted contributions (#596433) · Issues · GitLab.org / GitLab

Explore agent traces in merge requests to help review AI-assisted contributions

<details> <summary> Everyone can contribute. [Help move this issue forward](https://handbook.gitlab.com/handbook/marketing/developer-relations/contributor-success/community-contributors-workflows/#contributor-links) while earning points, leveling up and collecting rewards. </summary> - [Label this issue](https://contributors.gitlab.com/manage-issue?action=label&projectId=278964&issueIid=596433) </details>  ### Release notes Explore support for surfacing agent traces in GitLab merge requests so reviewers can inspect not only the diff, but also the prompt or task, agent steps, tool usage, attribution, and supporting evidence behind AI-assisted changes.\ \ As AI agents produce more code, maintainers and reviewers need better review artifacts to understand what was asked, what happened in between, and what still needs human verification. GitLab could explore whether agent-trace metadata can make AI-assisted changes more reviewable, inspectable, and trustworthy inside the merge request flow as well as use it for input for the Duo platform, to make automations based on intent and context. ### Problem to solve We are seeing more and more AI and agent-written pull requests across open source, and maintainers are increasingly the people who have to absorb the cost of that shift.\ \ The problem is not only that these contributions can be harder to review. It is that they often arrive with too little context, too little accountability, and too little signal about how the change came to be as the effort to contribute is lowered. A diff may show what changed, but not what the agent was asked to do, what steps it took, what tools it used, what assumptions it made, or where a human contributor actually understood, edited, and verified the result. As output grows faster than understanding, maintainers inherit the burden in the form of [comprehension debt](https://addyosmani.com/blog/comprehension-debt/), [cognitive debt, and intent debt.](https://arxiv.org/abs/2603.22106) This is one of the central themes in my [post and related writing: maintainers end up carrying the gap between generated output and actual understanding](https://www.ronaldtebrake.nl/blog/beyond-the-diff/). \ \ In OSS, that burden is amplified because maintainers are often volunteers, context is fragmented, and review quality is one of the last meaningful safeguards before code is accepted. Some communities are already responding by tightening contribution rules or pushing back on low-context AI submissions. \ \ This means the problem is broader than “reviewing AI code faster.” It is also about preserving healthy contribution workflows in open source, reducing maintainer overload, and making AI-assisted contributions more trustworthy and inspectable instead of forcing communities toward blanket restrictions. \ \ GitLab is a critical point in that delivery lifecycle. Merge requests are where changes, discussion, verification, and approval come together. That makes GitLab not only a place where the problems of agentic workflows become visible, but also a strong place to explore solutions. If agentic development is introducing new trust and maintainability challenges, the merge request is where context such as intent, provenance, attribution, and evidence can be surfaced, and where GitLab can create tools to help reviewers and maintainers deal with those challenges in the workflow they already use. \ \ This issue is therefore exploratory: can GitLab help maintainers deal with the growing volume and complexity of agent-written merge requests by surfacing agent-trace context directly in the merge request workflow and use that to help maintainers with context and tools to deal with the consequences? ### Proposal GitLab could explore an agent trace view in merge requests.\ \ At a minimum, that might mean showing: * the prompt, task, or intent behind the change * tools or skills used during execution * Files referenced / reviewed * attribution showing agent-authored versus human-edited code * supporting evidence such as tests run, checks performed, or validation notes by the Agent/Human Additional value directions: * GitLab Duo platform implementations like: * Risk evaluation based on intent * Instead of Codeowners, assign intent owners * ..? * AI statistics * Track which model/tool is used and merged for OSS * Track the amount of AI contributions * ? Track XYZ to help write better documentation for LLM's This should be treated as an additional review layer, not a replacement for reading risky code. The goal would be to let review start closer to intent, while still keeping human verification central.\ \ This is intentionally exploratory and not tied to one implementation. There are already signs that this is becoming a broader pattern:\ - [Entire.io](https://entire.io/) explores git-native checkpoints and intent review for AI-generated PRs\ - [git-ai](https://usegitai.com/) tracks AI-generated code and links lines to the agent, model, and transcript\ - [Agent Trace](https://agent-trace.dev/) is being discussed as an open specification for tracing AI-generated code\ - there is also active discussion around representing Agent Trace natively with OpenTelemetry which might make sense from existing implementations of that spec and would help throughout the lifecycle.\ \ Curious to hear your stance on this so we can fine tune the proposal. ### Intended users There is a possibility for a lot of intended users, here are a few already for a main focus * Sasha (Software Developer) reviewing or authoring AI-assisted merge requests * Delaney (Development Team Lead) responsible for review quality, delivery flow, and team confidence * Rachel (Release Manager) when traceability, change confidence, and release accountability matter * Cameron (Compliance Manager) in environments where provenance, attribution, and auditability are important I can imagine, based on the implementation and what we capture and how we show it there are more interesting use cases, also on the buyer side. ### Feature Usage Metrics Possible metrics to explore:\ - number of merge requests with attached trace metadata\ - number or percentage of reviewers opening the trace view\ - interaction depth with trace sections\ - percentage of AI-assisted MRs that include evidence such as tests or validation notes\ - qualitative feedback from reviewers on whether traces improved confidence or reduced guesswork\ \ A useful signal of value would be whether reviewers actually use the trace to understand intent before or during code review and if it increases merges. ### Does this feature require an audit event? Possibly yes, depending on scope.\ \ If GitLab stores, exposes, or governs access to agent-trace metadata, audit events may be useful for:\ - enabling or disabling the feature at project or group level\ - accessing sensitive trace payloads in regulated environments and permissions for this\ - changing retention or visibility settings for trace data ## Links / references Feel free to remove this, though it might help set the narrative as I've been exploring what this could mean in [Beyond the Diff: What Maintainers Need from an Agentic Workflow](https://www.ronaldtebrake.nl/blog/beyond-the-diff/) ![intent.png](/uploads/d1f75d39b34ec43eb2825f38690d685c/intent.png) ![attribution.png](/uploads/3a8d49cd58dcf697f300d89520cc3dc6/attribution.png){width=473 height=600} 1. [**The Entire CLI: How It Works & Where It’s Headed**](https://entire.io/blog/the-entire-cli-how-it-works-and-where-its-headed) 2. [**Comprehension Debt - the hidden cost of AI generated code**](https://addyosmani.com/blog/comprehension-debt/) 3. [**From Technical Debt to Cognitive and Intent Debt: Rethinking Software Health in the Age of AI**](https://arxiv.org/abs/2603.22106) 4. [**RFC: Native OpenTelemetry representation for Agent Trace**](https://github.com/cursor/agent-trace/issues/6) ~"feature::addition" ~"type::feature"

issue