Duo Developer: replace shallow-clone + unshallow workaround with a developer-like workspace setup

Problem

Duo Developer's coding agent has grown beyond "implement an issue from scratch" into a broad set of developer-like workflows: code research, implementing MR feedback, fixing failing pipelines, rebasing, resolving merge conflicts, and more. These workflows all assume a workspace that behaves like a local developer machine, which is what the underlying models are trained on.

The workspace they actually get today does not behave that way, and the gap is producing two visible classes of failures:

  • Slow or timing-out flow startup, especially on large monorepos like gitlab-org/gitlab, caused by a git fetch --unshallow step that has to download multi-GB of history.
  • Agent confusion from a detached HEAD and missing refs, leading to wrong assumptions when the agent runs typical developer git commands (git status, git rebase, git push, git log main..HEAD, etc.).

How it works today

The workspace is prepared in three layers before the agent runs. There is no service-side clone — everything is driven by env vars consumed by GitLab Runner inside the CI job.

1. StartWorkflowService#git_clone_variables (ee/app/services/ai/duo_workflows/start_workflow_service.rb#L154) sets:

  • GIT_DEPTH=1
  • GIT_FETCH_EXTRA_FLAGS=--filter=blob:none (or --filter=tree:0 behind FF dap_git_tree_zero_option)
  • GIT_LFS_SKIP_SMUDGE=1

Note: GIT_DEPTH=1 combined with a partial-clone filter is internally contradictory — at depth 1 the filter has no history to lazy-fetch, so we pay shallow-clone costs without gaining partial-clone benefits.

2. GitLab Runner's GetSources stage then runs in the job container (functions/concrete/run/stages/get_sources.go, shells/abstract.go):

  • git init + git remote add origin …
  • git fetch origin <refspec> <sha> --depth 1 --filter=blob:none
  • git checkout -f -q <SHA> — logged as "Checking out <short> as detached HEAD". This is the source of the detached-HEAD state, by design of the runner.
  • git clean -ffdx

3. The developer flow config (developer_unstable/1.0.0.yml) adds a git_unshallow deterministic step before the agent: if git rev-parse --is-shallow-repository | grep -q true; then git fetch --unshallow; fi, with a 600 s timeout. This was added as a workaround after we discovered shallow clones caused orphan commits in MRs (see ai-assist#2109).

What's actually broken

Today's setup conflates three independent dimensions of "what the workspace looks like" into one knob (shallow_clone):

Dimension Today's state Symptom
History depth depth=1 rebase, merge-base, log A..B, cherry-pick misbehave; MR diffs can show every file as changed
Object completeness --filter=blob:none / tree:0 (moot at depth 1) None today, but matters once depth is lifted
HEAD state & refs Detached HEAD at SHA; no local branch; target branch not reliably present as refs/remotes/origin/<x> "Detached HEAD" warnings, agent confused about current branch, git push needs explicit refspec, can't diff/rebase against target branch

The git_unshallow workaround addresses only the first dimension, and addresses it the most expensive way possible — full unshallow of the entire history. It does nothing for the HEAD-state issues, which is where most of the observable agent confusion originates.

Proposed direction (AI-assisted investigation — to be validated)

Replace the current shallow + unshallow strategy with a workspace that resembles a developer's local checkout: full history (cheaply, via partial clone), all branch refs present, and HEAD attached to a real local branch.

Concretely, three changes that can be made independently:

  1. Drop GIT_DEPTH=1 from the default in git_clone_variables. Keep --filter=blob:none (or --filter=tree:0) for cheap initial download with lazy blob fetch. This gives full commit + tree history immediately without paying for every blob.
  2. Ensure all branch refs are fetched so origin/<target_branch>, origin/main, etc. are local. Cheap because of partial clone.
  3. Attach HEAD to a named local branch (git checkout -B <workload_branch> <sha>) so git status, git branch, and git push behave normally.

Per the constraint that flow YAML should not contain setup steps, where changes 2 and 3 live (runner config hook, runner image entrypoint, or service-side commands) is to be decided during implementation.

This should also be designed to support multiple future clone modes (e.g., a future "initial-clone-only" mode at depth 1 with no history, for the fastest possible startup when a flow is known not to need history) by turning shallow_clone into a richer clone_mode parameter.

Considered alternatives

  • Keep and optimize the unshallow step. Rejected: doesn't address detached HEAD or missing refs, scales badly with repo size, fragile timeout.
  • Per-flow clone profiles (rebase needs history, implement-issue doesn't). Rejected for now: every current developer use case eventually needs history; maintaining the matrix isn't worth it. The clone_mode parameter leaves the door open if a genuinely no-history mode appears.
  • Runner-side reference repo / cached working copy. Deferred: real win on monorepos, but orthogonal to the correctness problems above. Worth a follow-up issue.

Expectations / Definition of Done

  • The setup is testable locally (e.g. against a local GitLab Runner) without requiring a full Duo platform deployment, so changes can be iterated on quickly.

  • Timing measurements are collected and shared for each option that is seriously considered. Measure from CI job start (beginning of get_sources) to the moment the developer_agent node begins executing — i.e. the wall-clock time the user waits today, including the current git_unshallow step. Run on at least:

    • A large monorepo: gitlab-org/gitlab
    • A small-to-medium repo (pick a representative one)

    Options to compare at minimum: current behavior (depth=1 + blob:none + unshallow step), no depth + blob:none, no depth + tree:0, and depth=1 alone (no unshallow).

  • The detached HEAD warning is gone in normal flow runs and git status / git branch report a real branch.

  • The git_unshallow deterministic step is removed from developer_unstable/1.0.0.yml (and the equivalent developer/2.0.0.yml / developer/2.0.0-orbit.yml configs).

  • The solution does not regress small-repo startup time.

Open questions

  • Where should the HEAD-attach and base-branch-fetch live? Candidates: runner config (post_get_sources_script), workflow-generic-image entrypoint, service-side set_up_executor_commands. To be decided during implementation.
  • Should the per-flow clone_mode be set in the flow YAML, in CreateAndStartWorkflowService params, or both?
Edited by 🤖 GitLab Bot 🤖