Duo Developer posts "test" comments when replying in threads
## Summary
When Duo Developer is asked to reply in a discussion thread, it sometimes posts one or more throwaway **"test" comments** before posting the real reply. These probe comments are visible to everyone and cannot be cleaned up afterwards, because the `ai_workflows` token cannot delete notes. Reviewers have hit this several times and it makes the agent look noisy/unpolished on otherwise successful runs.
This issue captures the signal from three sessions and proposes a fix.
Reported in the Developer 2.0.0 feedback issue: [gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#2298](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/work_items/2298) (notes by @bastirehm, @thomas-schmidt / @mcorren, and @.luke).
## What we observed
Three independent sessions, same pattern:
| Reporter | Surface | Probe comment(s) | Real reply |
|----------|---------|------------------|------------|
| @mcorren | issue thread via `glab api` | `test reply` | starts with `@`, backticks, `$`, emoji |
| @bastirehm | work-item thread via `glab api` | `@-`, then `test note via glab` | long markdown research |
| @.luke | MR thread (`glab mr note create --reply` available) | `test reply` | markdown with a diff block |
The shape is identical every time:
1. The task is always a **threaded reply** with a complex, markdown-heavy body.
2. The agent posts a **trivial probe write first** to confirm the call works, then sends the real body a few seconds later.
3. It **cannot delete the probe**, so it stays. In @.luke's case a human had to resolve the leftover notes.
### Root cause
A tool-by-tool replay of the @mcorren trace shifts the emphasis: the "test" comment is a **symptom**, not the root issue. The probe only appeared after the agent got stuck, and it got stuck for three reasons:
1. **Skill activation — the agent never opened the glab skill.** The skill was registered and available, but on this run the agent only `read_file`'d `gitlab-coding-principles`; it never opened `glab`. So the guidance it needed (the `-f` vs `-F`/`@file` distinction, the write-to-file/stdin pattern) was one unopened file away. This is the key insight: improving skill _content_ alone wouldn't have changed this run, because the skill wasn't consulted. The durable fix has to make the right skill more likely to be read, not just better written.
2. **The goal template fights the skill.** The delivery instructions currently say _"Do NOT use `glab issue note` or `glab mr note`… use `glab api` instead"_. That's wrong for MRs — `glab mr note create --reply` is the correct, safe, first-class threaded reply — and it actively pushes the agent onto the fragile raw-API path. This likely caused @.luke's probe on an MR where the safe command existed.
3. **CLI ergonomics for rich bodies.** Once on the raw-`glab api` path, inlining markdown is a minefield: bodies starting with `@` are read as filenames (`-F`/`@file`), and backticks/`$`/emoji break shell quoting. The agent hit three quoting failures in a row, posted `test reply` to isolate the problem, then couldn't `DELETE` it (token scope), and finally rediscovered the file-based pattern by trial and error.
So the leftover probe is what trial-and-error looks like when the safe command isn't reached and the skill that documents it isn't read. The fix therefore targets **skill-read reliability** and the **goal-template conflict** first, with the "no probe" prompt rule as a backstop rather than the primary lever.
## Proposal
A tool-by-tool replay of the trace changes the emphasis: the probe is a **symptom**, not the root issue. Two things drove it:
1. **The agent never opened the glab skill.** It was available but only `gitlab-coding-principles` was read — so the correct guidance (`-f` vs `-F`/`@file`, write-to-file/stdin) was one unopened file away. Improving skill _content_ alone wouldn't have changed this run.
2. **The goal template fights the skill.** It currently says _"Do NOT use `glab issue note` or `glab mr note`… use `glab api` instead"_, which is wrong for MRs (`glab mr note create --reply` is the correct, safe threaded reply) and pushes the agent onto the fragile raw-API path. This likely caused @.luke's probe.
So the durable fix is about **getting the right command used on the first try**
**1. Make skills more likely to be read (highest leverage, helps all flows)**
- Sharpen the glab skill `description:` to front-load trigger verbs and breadth — read it before running _any_ `glab`/GitLab API command (MRs, issues, work items, discussions/replies, comments, CI/CD).
- Strengthen the generic agent-skills instruction (executor `AgentSkillsResolver` + `workspace_agent_skills/base`) from "load when it matches" to a precondition: read the matching skill before acting in its domain; don't rely on prior tool knowledge when a skill exists.
**2. Reinforce the glab skill content** ([gitlab-org/cli#8316 (closed)](https://gitlab.com/gitlab-org/cli/-/issues/8316))
- Lead with the **write-to-file / stdin** pattern for threaded replies; warn that a body starting with `@` must not go through `-F`/`@file`; reiterate "no probe/test comments — writes are irreversible with this token, build the body in a file, verify, post once."
**3. Fix the goal template (`mention.rb`)**
- Stop forbidding `glab mr note create --reply`. Reduce to **intent + the dynamic `discussion_id` + a pointer to the glab skill**; keep glab mechanics out of the prompt.
**4. Durable follow-up (**[**cli#8316**](https://gitlab.com/gitlab-org/cli/-/issues/8316)**): add `glab issue note create --reply <discussion-id>`**
- `AddIssueDiscussionNote`/`ListIssueDiscussions` already exist in client-go; near line-for-line port of the MR `--reply`, and covers work items (shared issues discussions API). Lets us **remove** raw-`glab api` reply instructions from the goal template and make issue/MR replies symmetric and quoting-safe.
## References
- Feedback issue: [#2298](https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/work_items/2298)
- @mcorren job trace: [gitlab-org/gitlab/-/jobs/14687741305](https://gitlab.com/gitlab-org/gitlab/-/jobs/14687741305)
- @bastirehm session: [4461255](https://gitlab.com/gitlab-org/gitlab/-/automate/agent-sessions/4461255) (probes: [note_3456685763](https://gitlab.com/gitlab-org/gitlab/-/work_items/602887#note_3456685763), [note_3456699041](https://gitlab.com/gitlab-org/gitlab/-/work_items/602887#note_3456699041))
- @.luke session: [4247743](https://gitlab.com/gitlab-org/iglu/-/automate/agent-sessions/4247743) (probe: [iglu!192#note_3435270509](https://gitlab.com/gitlab-org/iglu/-/merge_requests/192#note_3435270509))
- glab skill issue: [gitlab-org/cli#8316 (closed)](https://gitlab.com/gitlab-org/cli/-/issues/8316)
issue