Architecture decision: entry point and execution engine for GitLabDuo mentions and Slack integration
@thomas-schmidt and I had a sync on the potential approaches for GitLabDuo mentions and Slack integration to figure out whether we can reuse the same approach for both use cases. Here is a summary:
Our requirements
We want a semi-async conversational agent experience: a user sends a message (MR comment, Slack message, etc.) and gets a response fast enough to feel conversational — not streaming like web chat, but not minutes either. This applies to @GitLabDuo mentions in MRs today and extends to Slack and potentially other messaging surfaces.
Both our approaches need to meet these requirements:
- Latency: Fast enough to feel responsive, ideally under 5 seconds (depending on the ask, e.g., simple questions vs. complex code analysis)
- Non-negotiable requirements:
- Composite identity — security requires this. The agent acts on behalf of a user with appropriate permissions, not as a privileged service account.
- As much as possible:
- Visibility into AI interactions — users (and ideally admins) should be able to see what the agent did, including tool calls (sessions, chat history, intent LLM call, or similar). This is important for governance, trust, and debugging.
The 2 approaches currently in flight
Headless Workhorse (headless injector)
A Sidekiq worker POSTs to a headless endpoint. Workhorse intercepts, runs the workflow (currently using the agentic chat) to completion via gRPC to DWS, and proxies tool calls back to Rails. The worker reads the result and posts it as @GitLabDuo.
There is a working POC that demonstrates this end-to-end with very low latency.
Messaging adapter via CI runner
A @GitLabDuo mention (e.g., in a Slack message or issue comment) triggers a CI workload in a lightweight, auto-created duo-workspace project. The workflow runs on a runner, and lifecycle events (e.g., WorkloadFinishedEvent) post the response back — as an MR note, Slack reply, or any other messaging surface (depending on the adapter that triggers it).
There is a working POC that demonstrates this end-to-end.
The two decisions
We think this discussion mixes two independent architectural decisions that are worth separating:
- Entry point: How does a mention get into the system and how does the response get back?
- Execution engine: Where does the agent actually run?
The current approaches bundle these together (Workhorse headless bundles a specific entry point with Workhorse execution; the messaging adapter bundles its entry point with runner execution), but they don't have to be coupled. For example, the messaging adapter could use Workhorse as its execution engine instead of a runner. Separating these makes the tradeoffs clearer.
Decision 1: Entry point
| Dimension | Current mention flow (Sidekiq → Workhorse) | Messaging adapter |
|---|---|---|
| How it works | Sidekiq worker POSTs to a headless Workhorse endpoint. Includes an upfront intent classification call (silent, user doesn't see it) to route the request. | TriggerFlowService creates a workflow with a callback context. On completion, CallbackWorker dispatches the result to the right adapter (Slack, MR note, etc.). |
| Reusability across surfaces | Purpose-built for MR mentions. Each new surface (Slack, issues, etc.) would need its own integration path or rebuilding / refactoring the messaging adapter pattern. | Single entry point — adding a new surface means adding an adapter, not a new pipeline. |
| Routing / intent | Intent classification happens upfront as a silent call before processing. | Routing is handled by which flow is triggered. No intent classification today. |
| Maturity | Working POC for MR mentions. | Adapter pattern exists. MR mention adapter would be new but follows established pattern. |
Key question: Do we want a single entry point pattern for all messaging surfaces, or are surface-specific integrations acceptable?
Decision 2: Execution engine
This is independent of the entry point. Either entry point could use either engine.
| Dimension | Workhorse headless | CI Runner |
|---|---|---|
| Composite identity | Not yet — needs to be added (currently bound to ai_workflow scope) |
In place |
| Response time for simple questions | ~1-2s | ~5s (lightweight workspace project, no clone). Depends on the available runner and can benefit from improvements to startup times (which could always give a cloned project without additional response time concerns) |
| Permission control | The chat flow has write tools that require human approval — unclear how approval would work in a headless scenario (but currently gated by LLM intent classification in a way that write actions would never be routed to agentic chat. More Info) | Controlled via flow definition (which tools are available) + composite identity ensures permission boundaries |
| Repo access | Limited to API-provided data | Has access to a development environment (computer): can do API requests but also clone code if it needs deeper context — an extra capability, not a latency penalty by default |
| Infrastructure | New headless Workhorse service to build and maintain | Uses existing runner infrastructure that all foundational flows already use |
| Runner compute | Does not consume runner compute | Uses runner compute |
Key questions:
- Is ~5s acceptable? If yes, the latency difference is not a deciding factor.
- How much weight does composite identity carry? If it's a blocker for shipping (as it seems from our security requirements), the runner path is ready today; Workhorse needs additional work.
- Is a new execution path justified? There are already several triggering approaches. Is the latency gain worth adding another path to maintain?
Separate topics
These need decisions but are independent of entry point and execution engine. They can be iterated on separately:
| Topic | Notes |
|---|---|
| Which flow / agent to use | For read-only questions, we can restrict to read-only tools. The Duo Developer flow works today but can be swapped or delegate to specialized flows. |
| Context injection | What context do we automatically provide to the agent? API-fetched context vs. cloned repo vs. hybrid. Running a CI job in the duo-workspace project (lightweight) is fast but if we would run in the project of an MR, the automatic cloning of a repo for CI jobs might take a lot of time. |
| Intent classification | Where does it live — upfront (as in the current mention flow) or within the flow itself? An upfront call adds latency and opacity; an in-flow approach gives the agent more control but may need guardrails. |
| Chat history / visibility | Currently filtered for chat flow — anything using chat flow shows up there. Needs thought on how agent interactions surface distinctly. |
Next steps
- PoC: Validate the messaging adapter path for
@GitLabDuoMR mentions end-to-end (latency + response quality) - Discuss and resolve the two decisions above with input from principle engineers across both approaches
- Get alignment from product on timelines as that could impact how much time we have to make this work
Related issue: https://gitlab.com/gitlab-org/gitlab/-/work_items/585237+