Duo Developer: replace shallow-clone + unshallow workaround with a developer-like workspace setup
Problem
Duo Developer's coding agent has grown beyond "implement an issue from scratch" into a broad set of developer-like workflows: code research, implementing MR feedback, fixing failing pipelines, rebasing, resolving merge conflicts, and more. These workflows all assume a workspace that behaves like a local developer machine, which is what the underlying models are trained on.
The workspace they actually get today does not behave that way, and the gap is producing two visible classes of failures:
- Slow or timing-out flow startup, especially on large monorepos like
gitlab-org/gitlab, caused by agit fetch --unshallowstep that has to download multi-GB of history. - Agent confusion from a detached HEAD and missing refs, leading to wrong assumptions when the agent runs typical developer git commands (
git status,git rebase,git push,git log main..HEAD, etc.).
How it works today
The workspace is prepared in three layers before the agent runs. There is no service-side clone — everything is driven by env vars consumed by GitLab Runner inside the CI job.
1. StartWorkflowService#git_clone_variables (ee/app/services/ai/duo_workflows/start_workflow_service.rb#L154) sets:
GIT_DEPTH=1GIT_FETCH_EXTRA_FLAGS=--filter=blob:none(or--filter=tree:0behind FFdap_git_tree_zero_option)GIT_LFS_SKIP_SMUDGE=1
Note: GIT_DEPTH=1 combined with a partial-clone filter is internally contradictory — at depth 1 the filter has no history to lazy-fetch, so we pay shallow-clone costs without gaining partial-clone benefits.
2. GitLab Runner's GetSources stage then runs in the job container (functions/concrete/run/stages/get_sources.go, shells/abstract.go):
git init+git remote add origin …git fetch origin <refspec> <sha> --depth 1 --filter=blob:nonegit checkout -f -q <SHA>— logged as "Checking out<short>as detached HEAD". This is the source of the detached-HEAD state, by design of the runner.git clean -ffdx
3. The developer flow config (developer_unstable/1.0.0.yml) adds a git_unshallow deterministic step before the agent: if git rev-parse --is-shallow-repository | grep -q true; then git fetch --unshallow; fi, with a 600 s timeout. This was added as a workaround after we discovered shallow clones caused orphan commits in MRs (see ai-assist#2109).
What's actually broken
Today's setup conflates three independent dimensions of "what the workspace looks like" into one knob (shallow_clone):
| Dimension | Today's state | Symptom |
|---|---|---|
| History depth | depth=1 |
rebase, merge-base, log A..B, cherry-pick misbehave; MR diffs can show every file as changed |
| Object completeness | --filter=blob:none / tree:0 (moot at depth 1) |
None today, but matters once depth is lifted |
| HEAD state & refs | Detached HEAD at SHA; no local branch; target branch not reliably present as refs/remotes/origin/<x> |
"Detached HEAD" warnings, agent confused about current branch, git push needs explicit refspec, can't diff/rebase against target branch |
The git_unshallow workaround addresses only the first dimension, and addresses it the most expensive way possible — full unshallow of the entire history. It does nothing for the HEAD-state issues, which is where most of the observable agent confusion originates.
Proposed direction (AI-assisted investigation — to be validated)
Replace the current shallow + unshallow strategy with a workspace that resembles a developer's local checkout: full history (cheaply, via partial clone), all branch refs present, and HEAD attached to a real local branch.
Concretely, three changes that can be made independently:
- Drop
GIT_DEPTH=1from the default ingit_clone_variables. Keep--filter=blob:none(or--filter=tree:0) for cheap initial download with lazy blob fetch. This gives full commit + tree history immediately without paying for every blob. - Ensure all branch refs are fetched so
origin/<target_branch>,origin/main, etc. are local. Cheap because of partial clone. - Attach HEAD to a named local branch (
git checkout -B <workload_branch> <sha>) sogit status,git branch, andgit pushbehave normally.
Per the constraint that flow YAML should not contain setup steps, where changes 2 and 3 live (runner config hook, runner image entrypoint, or service-side commands) is to be decided during implementation.
This should also be designed to support multiple future clone modes (e.g., a future "initial-clone-only" mode at depth 1 with no history, for the fastest possible startup when a flow is known not to need history) by turning shallow_clone into a richer clone_mode parameter.
Considered alternatives
- Keep and optimize the unshallow step. Rejected: doesn't address detached HEAD or missing refs, scales badly with repo size, fragile timeout.
- Per-flow clone profiles (rebase needs history, implement-issue doesn't). Rejected for now: every current developer use case eventually needs history; maintaining the matrix isn't worth it. The
clone_modeparameter leaves the door open if a genuinely no-history mode appears. - Runner-side reference repo / cached working copy. Deferred: real win on monorepos, but orthogonal to the correctness problems above. Worth a follow-up issue.
Expectations / Definition of Done
-
The setup is testable locally (e.g. against a local GitLab Runner) without requiring a full Duo platform deployment, so changes can be iterated on quickly.
-
Timing measurements are collected and shared for each option that is seriously considered. Measure from CI job start (beginning of
get_sources) to the moment thedeveloper_agentnode begins executing — i.e. the wall-clock time the user waits today, including the currentgit_unshallowstep. Run on at least:- A large monorepo:
gitlab-org/gitlab - A small-to-medium repo (pick a representative one)
Options to compare at minimum: current behavior (
depth=1 + blob:none+ unshallow step),no depth + blob:none,no depth + tree:0, anddepth=1alone (no unshallow). - A large monorepo:
-
The detached HEAD warning is gone in normal flow runs and
git status/git branchreport a real branch. -
The
git_unshallowdeterministic step is removed fromdeveloper_unstable/1.0.0.yml(and the equivalentdeveloper/2.0.0.yml/developer/2.0.0-orbit.ymlconfigs). -
The solution does not regress small-repo startup time.
Open questions
- Where should the HEAD-attach and base-branch-fetch live? Candidates: runner config (
post_get_sources_script),workflow-generic-imageentrypoint, service-sideset_up_executor_commands. To be decided during implementation. - Should the per-flow
clone_modebe set in the flow YAML, inCreateAndStartWorkflowServiceparams, or both?
Related
- Epic: &18785
- Original orphan-commit issue that introduced the unshallow workaround:
ai-assist#2109 - Current flow config:
developer_unstable/1.0.0.yml - Current clone variables:
start_workflow_service.rb#L154 - Runner clone implementation:
functions/concrete/run/stages/get_sources.go,shells/abstract.go