Add full-clone workspace option for Duo Developer
What does this MR do and why?
Gives the Duo Developer flow a full, developer-like git workspace instead of the default shallow partial clone, behind the dap_full_clone feature flag.
Today the runner produces a shallow (depth=1) partial (blob:none) clone with a detached HEAD and no base branch ref, and the flow then runs a slow git fetch --unshallow workaround. This causes startup latency/timeouts on large repos and leaves the agent fighting the git state (re-attaching HEAD, base-branch diffs/rebase failing).
When dap_full_clone is enabled (scoped to the developer flow), the runner skips its own clone (GIT_STRATEGY=none) and we perform a real git clone, then check out the source branch (or the default branch when none is given): full history, HEAD attached to a named branch, all refs present. This removes the unshallow step and the detached-HEAD confusion.
Background, root-cause and the cross-team discussion (Gitaly/Git): #602558 (comment 3455834144)
Implementation
dap_full_clonefeature flag (development, default off).StartWorkflowService#full_clone?— gated onworkflow_definition == "developer/v1"and the flag; non-developer flows are unaffected.git_clone_variablesreturnsGIT_STRATEGY=none(+GIT_LFS_SKIP_SMUDGE) so the runner skips its own clone.- Auth without the runner. With
GIT_STRATEGY=nonethe runner does no GetSources, so it injects no credentials. We add a gitcredential.helper(viaGIT_CONFIG) so the token authenticates our clone and the agent's later push/fetch — without ever landing in the clone URL or.git/configon disk. - Proactive auth. We also set
http.proactiveAuth=basic. The runner'sFF_USE_GIT_PROACTIVE_AUTHonly covers the runner's own GetSources (which we skip), so we set it ourselves: our clone sends credentials on the first request instead of after a401, keeping git requests in the authenticated rate-limit bucket. Requires git >= 2.46; older git silently ignores it, so it is inert on executor images shipping an older git and activates automatically once the image carries a new enough git. - Cache-safe clone.
full_clone_setup_commandsclones into a sibling dir and moves only.gitinto the build dir, thengit checkout -f. This preserves any cache/artifacts the runner restored into the build dir (a plaingit clonerefuses a non-empty target, and clearing the dir first would discard the cache). Runs before git hooks /setup_scriptso they see a populated repo.
No behavior change until the flag is enabled.
Validation (local GDK, runner path)
Tested through the real runner path on a heavy GDK import of the GitLab repo (515k commits, 2.5 GB), triggering the actual Duo Developer flow and toggling only the feature flag (same code, same runner). Fresh issue per run.
| Workspace property | Flag off — baseline | Flag on — this MR |
|---|---|---|
| runner step | shallow + detached checkout | Skipping Git repository setup (GIT_STRATEGY=none) |
| shallow? | true | false |
| HEAD | detached | attached to a real branch |
| commits reachable | 1 | 515,192 |
| base ref present? | no | yes |
| partial filter / promisor | blob:none / true |
none / none (true full clone) |
git log <base>..HEAD |
fails (exit 128) | works (exit 0) |
| startup git work (GDK net) | ~20s shallow + ~83s git fetch --unshallow |
~103s git clone, no unshallow |
- The flag produces a developer-identical workspace; base-branch operations (
log base..HEAD, diffs, rebase) work where the baseline hard-fails. - "Fresh issue ⇒ default branch, MR ⇒ source branch" behaves as intended.
- On a trivial task, full clone adds no agent overhead (LLM-call count on par with baseline once the fix below was in place). It doesn't yet show a saving because the task never exercises blame/history — that's where the win is expected.
- Startup cost on GDK's local network is comparable (~103s either way); the real monolith cost is what bundle URI / Scaling-Git work addresses (see the linked discussion).
How to set up and validate locally
Requires a project where the Duo Developer flow can run (Duo features enabled, a service account assigned). On
.comenable the flag per-project.
-
Enable the flag for the project under test:
Feature.enable(:dap_full_clone, Project.find_by_full_path('group/project')) -
Trigger the Duo Developer flow on an issue or MR (assign the service account).
- On an issue → workspace checks out the project's default branch.
- On an MR → checks out the MR's source branch.
-
Confirm the full clone in the workload CI job log:
Skipping Git repository setup Cloning into '/builds/<group>/<project>'...(instead of
Checking out <sha> as detached HEAD …), and nogit fetch --unshallowstep later. -
Disable when done:
Feature.disable(:dap_full_clone, Project.find_by_full_path('group/project'))
Follow-ups (out of scope here)
- Replace the
shallow_cloneboolean with a caller-specifiableclone_mode(full/shallow/none) — this MR is the transitional first step. - Remove the now no-op
git_unshallowstep from the developer flow config (duo-workflow-service repo) once this is validated. - Default-branch landing: on a fresh issue the agent starts on the (often protected) default branch and may try to commit/push there before branching. Add guidance (system prompt / AGENTS.md) to create a feature branch first, or check out a fresh working branch for that case.
- MR-triggered flows do not pass the MR source branch (#603034): MR runs currently land on the default branch; the trigger must plumb the MR source branch. Independent of this MR's clone logic.