Commit 77530995 authored by Frédéric Caplette's avatar Frédéric Caplette
Browse files

Add spec driven development doc

parent c5312696
Loading
Loading
Loading
Loading
+127 −0
Original line number Diff line number Diff line
---
title: "Spec-Driven Development (SDD)"
description: "Design document for Spec-Driven Development, a structured planning layer for capturing intent and driving agentic execution on GitLab work items."
status: ongoing
creation-date: "2026-04-16"
authors: [ "@fredericcaplette" ]
coaches: [ "@ntepluhina" ]
dris: [ "@fredericcaplette", "@vanessaotto", "@marcsaleiko" ]
owning-stage: "~devops::plan"
participating-stages: []
toc_hide: true
---

<!-- Design Documents often contain forward-looking statements -->
<!-- vale gitlab.FutureTense = NO -->

{{< engineering/design-document-header >}}

## Summary

Spec-Driven Development (SDD) is fundamentally about capturing intent and using it to generate outputs through Agentic work. Today, context needed to execute work is scattered across issues, comments, design files, code, and people's heads. Agents that try to act on a work item without structured input produce inconsistent results because they lack the why, how, and what.

SDD solves this by introducing a structured planning layer on GitLab work items. An **Agent plan** is collaboratively built by humans and AI agents, enriched with project context, and then consumed downstream by execution agents (Duo Developer) and validation agents (Duo Review).

We are building this feature with the mindset that any work done should work for Agents and Human both. In practice what this means is that SDD is a way of working,
and we are seeing this rise in popularity with Agents. But the underlying concept is very human: documentation-centric way of developping, accelerated with Agents.

For product context see the [parent epic](https://gitlab.com/groups/gitlab-org/-/work_items/21218) and [wiki](https://gitlab.com/groups/gitlab-org/plan-stage/-/wikis/Spec-driven-development-(SDD)).

## End-to-end flow

```mermaid
flowchart LR
    subgraph context["1 · Context Gathering"]
        Memory[Memory]
        DL[Decision log]
    end

    subgraph generation["2 · Plan Generation"]
        IB[Interactive builder] -->|produces| WP[Agent plan widget]
        WP -->|evaluated by| Score[Readiness Score]
    end

    subgraph validation["3 · Plan Validation"]
        DD[Duo Developer] -->|produces| MR[Merge Request]
        DR[Duo Review] -->|validates MR against plan| MR
    end

    WI[Work item] -->|user opens| IB
    Memory -->|injected into prompt| IB
    DL -->|fed as context| IB
    WP -->|read by| DD
    WP -->|read by| DR
    WI ---|Work item ↔ MR link| MR
```

## Three layers to build

The architecture breaks down into three layers. Each layer has its own problems to solve and its own set of subpages with detailed designs.

### 1. Context Gathering

Before a plan can be generated, the agent needs context: project conventions, past decisions, architectural patterns, related work items, and team preferences. The challenge is figuring out where to store these layers of context, how to keep them current, and how to inject the right slices at the right time.

| Component | Description | Subpage |
| ----------- | ------------- | --------- |
| Memory | Long-lived project and team context stored in git, with multiple layers (project, group, user) | [Memory and context injection](memory.md) |
| Decision log | Structured decisions (pending and resolved) captured on the work item, fed into plan generation sessions | [Decision log](decision_log.md) |

### 2. Plan Generation

The Agent plan is the central artifact. Users talk with agents through the Interactive builder to iteratively shape a plan that captures the Why, How, What, and a clear list of pending questions and steps. Plans get refined over time and multiple stakeholders can contribute, so we need versioning, an auditing trail, and a lightweight review flow.

| Component | Description | Subpage |
| ----------- | ------------- | --------- |
| Agent plan widget | Markdown-based work item widget that stores the plan | [Agent plan](work_plan.md) |
| Interactive builder | Reusable Duo Chat + live preview UI for iterating on LLM output | [Interactive builder](interactive_builder.md) |
| Plan readiness scoring | Lightweight quality gate that signals whether a plan is ready for agent execution | [Scoring](scoring.md) |

### 3. Plan Validation

Once a plan is approved, it needs to carry weight. Downstream agents read the plan, execute against it, and validate that the resulting merge request matches the stated intent. This requires a strong link between work items and MRs.

| Component | Description | Subpage |
| ----------- | ------------- | --------- |
| Work item ↔ MR relationship | First-class bidirectional link so downstream agents can find and validate against the plan | [Work item to MR relationship](wi_mr_relationship.md) |
| Downstream consumers | How Duo Developer and Duo Review read and use the plan | [Downstream consumers](downstream_consumers.md) |

## Decisions

| Date | Decision | Who |
| ------ | ---------- | ----- |
| 2026-03-30 | Short-lived artifact is an **Agent plan** on the work item (not standalone, supports all work item types). | Workshop |
| 2026-03-31 | Output is **Markdown** through work item templates. | @fredericcaplette |
| 2026-03-31 | **Approval workflow out of scope** for current phase. | @izzychu, @shekharpatnaik |
| 2026-04-09 | Use **sync Duo Chat** for AI interactions on work items. | @vanessaotto, @fredericcaplette |
| 2026-04-09 | **Markdown over YAML** for human readability. | @fredericcaplette, @vanessaotto, @timzallmann |

## Constraints

- Agent sessions are single-user (async collaboration only through work item comments)
- Work item approvals do not exist on the platform
- MR-to-work-item link is the only bridge for Duo Review to access the plan
- IDE interactive builder out of scope for v1
- Long-lived Spec storage unresolved

## Timeline

| Workstream | Target | Confidence |
|------------|--------|------------|
| 0 - [Agent plan widget](https://gitlab.com/groups/gitlab-org/-/work_items/21511) | 2026-06-30 | Medium |
| 0.5 - [Plan scoring](https://gitlab.com/gitlab-org/gitlab/-/work_items/596916) | TBD | Not started |
| 1 - [MR enforcement](https://gitlab.com/groups/gitlab-org/-/work_items/21514) | TBD | Not started |
| 2 - [Memory loop](https://gitlab.com/groups/gitlab-org/-/work_items/21512) | TBD | Not started |
| 3 - [Decision log](https://gitlab.com/groups/gitlab-org/-/work_items/21552) | 2026-06-30 | Medium |
| [Interactive builder](https://gitlab.com/groups/gitlab-org/-/work_items/21653) | TBD | Medium |

Release stages: Core team (now) -> Internal preview (2026-05-30) -> Customer preview (TBD) -> GA (TBD).

## Links

- [Parent Epic](https://gitlab.com/groups/gitlab-org/-/work_items/21218)
- [Wiki SSoT](https://gitlab.com/groups/gitlab-org/plan-stage/-/wikis/Spec-driven-development-(SDD))
- [AI Panel architecture](../ai_panel/_index.md)
- [Design Issue](https://gitlab.com/gitlab-org/gitlab/-/work_items/592316)
- [Engineering Spike](https://gitlab.com/gitlab-org/gitlab/-/work_items/592314)
- [Workshop notes](https://docs.google.com/document/d/1zFs7ziXNBrY7rYvXhhWZvyavWT-j_DAqNwmxbMMTa4o/edit?tab=t.0)
+36 −0
Original line number Diff line number Diff line
---
title: "Decision log"
description: "Design for the Decision log work item widget that captures structured decisions for SDD context gathering."
status: ongoing
maturity: still defining
creation-date: "2026-04-16"
authors: [ "@fredericcaplette" ]
owning-stage: "~devops::plan"
toc_hide: true
---

Read more about SDD in [Spec-Driven Development](_index.md).

**Maturity: Still defining**

## Summary

The Decision log is a new work item widget that captures structured decisions (pending and decided). It provides a reference point for humans and agents and is included as context whenever an Agent plan generation session starts.

Epic: [Workstream 3 — Decision log](https://gitlab.com/groups/gitlab-org/-/work_items/21552)

## Data model

A list of decision entries on the work item. Each entry has: description, status (pending/decided), author, date, and optionally an assignee. Storage format TBD — options are a JSON column on a widget table or individual records in a dedicated table. Most likely the issue table is too big and we'd want a separate table.

## Interactions

- Humans can add pending decisions manually.
- Agents can read the log as input context and write new entries through work item tools in AI Gateway.
- Resolving a comment thread with "Make a decision" stores the result in the log, bridging the Notes system with the Decision log widget.
- Assigning a pending decision to a user creates a to-do item through the existing to-do system.
- Raise any pending threads within the widget so that you can make decisions from the widget per thread and leave no threads unanswered.

## Consumption

Whenever a Duo Chat session starts on a work item, the Decision log is read through the work item tool.
+51 −0
Original line number Diff line number Diff line
---
title: "Downstream consumers"
description: "How Duo Developer and Duo Review consume the Agent plan for execution and validation."
status: ongoing
maturity: still defining
creation-date: "2026-04-16"
authors: [ "@fredericcaplette" ]
owning-stage: "~devops::plan"
toc_hide: true
---

Read more about SDD in [Spec-Driven Development](_index.md).

**Maturity: Still defining**

## Summary

The Agent plan's value comes from being consumed by downstream systems. This page covers how Duo Developer and Duo Review read and use the plan.

## Duo Developer handoff

When a user triggers Duo Developer from a work item, the agent needs to read the Agent plan. Two options:

1. The Flow's first step fetches the plan through the work item GraphQL API.
2. The plan is injected into the agent's context at session creation.

The chosen approach and its implementation are TBD.

## Duo Review validation

The Duo Review agent validates MR changes against the originating Agent plan. The link between MR and plan is established through the [work item–MR relationship](wi_mr_relationship.md). The review agent fetches the plan from the linked work item and uses it as evaluation criteria.

Epic: [Workstream 1 — MR spec enforcement](https://gitlab.com/groups/gitlab-org/-/work_items/21514)

## Merge check in MR

Validating the Agent plan at merge time (as a merge check) is still being defined. This may be covered by the [Code Review Lifecycle Transformation](https://docs.google.com/document/d/1BsdLonLqNyB0QSdEiUaJJ5GNQLkr-gKx_uQNTApdfCA/edit?tab=t.0) initiative workstream rather than built within SDD directly.

## API surface

The Agent plan is exposed through the standard work item widget GraphQL/REST API. External tools (`curl`, `glab`, IDE extensions) can read and write plans using this surface.

## Structured vs raw Markdown

Downstream agents consume the plan as raw Markdown text today. If the plan evolves to a richer structure (sections, acceptance criteria, test cases), serialization and backward compatibility will need to be addressed.

## Decisions

| Date | Decision | Who |
|------|----------|-----|
| 2026-04-17 | Agent plan format consumed by downstream agents is **Markdown** (not YAML or structured JSON). Markdown is human-readable, already the common format of GitLab descriptions, and avoids serialization overhead for consumers. | @fredericcaplette |
+40 −0
Original line number Diff line number Diff line
---
title: "Interactive builder"
description: "Design for the Interactive builder, a reusable Duo Chat and live preview UI framework for iterating on LLM output."
status: ongoing
maturity: mature
creation-date: "2026-04-16"
authors: [ "@fredericcaplette" ]
owning-stage: "~devops::plan"
toc_hide: true
---

Read more about SDD in [Spec-Driven Development](_index.md).

**Maturity: Mature**

## Summary

The Interactive builder is a generic, reusable UI framework for iterating on LLM output. The left side is a Duo Chat conversation, the center pane shows a live preview of the artifact being produced. SDD (Agent plan generation) is the first use case, but the framework is feature-agnostic.

Epic: [Interactive builder](https://gitlab.com/groups/gitlab-org/-/work_items/21653)

[PoC video](https://www.youtube.com/watch?v=UjZ-yHNg6Ic)

## Relationship to AI Panel

The Interactive builder lives inside the existing [AI Panel architecture](../ai_panel/_index.md) as a new sub-application. It follows the same component agnosticism, props passthrough, and event-driven communication patterns described there.

## Live preview

As the agent updates the Agent plan through tool calls, the preview pane reactively reflects the change. The propagation mechanism (Apollo cache write on tool completion vs subscription) is TBD.

## Generality

The framework exposes a plugin contract so other features can register their own artifact type and preview component. The specifics of this contract are TBD.

## Scope

The Interactive builder is a refinement on top of the base Agent plan + Duo Chat experience. For v1 the widget and chat are sufficient; the builder is a follow-up iteration.

This workstream is **on ice** until we validate that the Agent plan itself adds value through the base Duo Chat flow. Investing in a richer builder UI before confirming the core artifact is useful would be premature.
+36 −0
Original line number Diff line number Diff line
---
title: "Memory and context injection"
description: "Design for modeling, storing, and injecting project context and memory into Agent plan generation."
status: ongoing
maturity: not for now
creation-date: "2026-04-16"
authors: [ "@fredericcaplette" ]
owning-stage: "~devops::plan"
toc_hide: true
---

Read more about SDD in [Spec-Driven Development](_index.md).

**Maturity: Not for now**

## Summary

Agents generating or refining an Agent plan need context beyond the current work item: project conventions, past decisions, architectural patterns, team preferences. This page covers how that external context is modeled, stored, and injected.

Epic: [Workstream 2 — Memory loop](https://gitlab.com/groups/gitlab-org/-/work_items/21512)

## Memory layers

Context exists at multiple scopes: Project, Group, Instance, Organization. The model for retrieving and prioritizing layered context is TBD. A possible fifth scope (Product) has been discussed.

## Injection point

Where in the AI Gateway pipeline memory gets injected — before prompt assembly as system context, or as a tool the agent calls on demand — is TBD.

## Learning loop

When an agent completes work and the result is reviewed (approved/rejected), the outcome should be captured as a learning and fed back into memory for future plan generation. The mechanism for this feedback loop is TBD.

## Storage

Where memory lives — repository files (`.gitlab/memory/`), a database-backed store, or the Wiki — is TBD.
Loading