Loading content/handbook/engineering/architecture/design-documents/duo_workflow/_index.md +23 −0 Original line number Diff line number Diff line Loading @@ -209,6 +209,29 @@ our executors and the Duo Workflow Service and therefore remove the need for our executors to proxy requests to the GitLab instance for self-managed as documented below. #### From messaging services (Slack, Teams, etc.) The Duo Messaging Service allows users to trigger workflows from external messaging platforms by @mentioning Duo. It uses the same CI pipeline execution path as remote workflows but with a different trigger mechanism: 1. User @mentions Duo in a messaging service (e.g., Slack) 2. The messaging service sends an event to GitLab Rails 3. A messaging adapter translates the event into a goal and callback context 4. The orchestrator resolves the user's `duo_default_namespace`, finds or creates a `duo-workspace` project, and triggers a `developer/v1` flow 5. The agent runs in CI with the same composite identity as Duo Developer 6. When the workflow completes, a `CallbackWorker` (subscribed to `WorkloadFinishedEvent` via EventStore) delivers the result back through the adapter to the messaging service The adapter pattern allows adding new messaging platforms by implementing a small interface (~5 methods) without changing the orchestration or execution infrastructure. For the full architecture, see [ADR 008: Duo Messaging Service](decisions/008_duo_messaging_service.md). ### Self-managed architecture #### With local Workflow service Loading content/handbook/engineering/architecture/design-documents/duo_workflow/decisions/008_duo_messaging_service.md 0 → 100644 +354 −0 Original line number Diff line number Diff line --- title: "Duo Agent Platform ADR 008: Duo Messaging Service" status: proposed creation-date: "2026-04-17" authors: [ "@thomas-schmidt" ] coach: [ ] approvers: [ ] owning-stage: "~devops::ai_powered" participating-stages: [] toc_hide: true --- ## Context We want users to interact with Duo from external messaging services — starting with Slack, then Microsoft Teams, WhatsApp, Telegram, and others. A user @mentions Duo, gives it a task, and Duo works on it asynchronously and posts back the result. Two challenges are specific to messaging: 1. CI pipelines require a project, but messaging services have no project context 2. Multiple messaging platforms need to be supported without duplicating orchestration logic ### Alternatives considered Five approaches were investigated: 1. **CI job (Flows API)** — Trigger a CI pipeline via the existing Flows infrastructure. Battle-tested, ADR 004 compliant, no Workhorse or DWS changes. The only approach that provides a real execution environment — the agent can git clone, run tests, install tools, and do full development tasks. Downside: CI startup latency (~10s with empty project). Requires a project for the pipeline — solved by auto-creating a workspace project. 2. **WebSocket blocking** — Sidekiq worker opens a WebSocket to Workhorse, keeps it open for the full workflow duration. Simple, supports streaming. Downside: blocks a Sidekiq thread for up to 5 minutes per request, limiting throughput to ~50 concurrent workflows per Sidekiq process. No execution environment — the agent runs inside Workhorse with no filesystem, no git, no ability to run commands. Limits the agent to read-only API interactions with no path to development tasks. 3. **WebSocket fire-and-forget** — Sidekiq opens WebSocket, sends start request, disconnects immediately. **Blocked**: prototyping revealed Workhorse terminates the workflow when the client disconnects (sends `StopWorkflow` on clean close, tears down gRPC on abnormal close). Would require Workhorse changes to add a headless/detached mode. Same execution environment limitation as option 2. 4. **Direct gRPC** — Sidekiq opens a gRPC bidi stream directly to DWS. Lower latency, type-safe. **Violates ADR 004** (introduces a second path to DWS). Must reimplement HTTP action proxying in Ruby. No established pattern for gRPC bidi streaming from Sidekiq in the codebase. Same execution environment limitation — no filesystem or tooling available. 5. **Workhorse headless HTTP** — New Workhorse endpoint that accepts a workflow trigger via HTTP POST, manages the gRPC stream internally. **Requires cross-team Workhorse changes** (~50-100 lines of Go) and a modified runner lifecycle. Same execution environment limitation as options 2-4 — no path to development tasks without additional architecture. ## Decision Use the **Flows API (CI job)** approach with an **adapter pattern** for multi-platform support and a **per-namespace workspace project** to provide CI context. ### Architecture ```mermaid graph TB classDef messaging fill:#dbeafe,stroke:#93c5fd,color:#1e3a5f classDef adapter fill:#d1fae5,stroke:#6ee7b7,color:#065f46 classDef orchestrator fill:#ffedd5,stroke:#fdba74,color:#7c2d12 classDef execution fill:#ede9fe,stroke:#c4b5fd,color:#3b0764 classDef callback fill:#fef9c3,stroke:#fde047,color:#713f12 classDef workspace fill:#f3e8ff,stroke:#d8b4fe,color:#581c87 subgraph MSG["💬 MESSAGING SERVICES"] Slack(["Slack"]) MTeams(["Microsoft Teams"]) Others(["WhatsApp · Telegram · ..."]) end subgraph ADAPT["🔌 ADAPTERS — one per messaging service"] direction LR SA["Slack Adapter<br/>👀 ✅ ❌"] TA["Teams Adapter"] OA["..."] end subgraph ORCH["⚙️ ORCHESTRATOR"] direction LR O1["Resolve user's<br/>duo_default_namespace<br/>(root namespace)"] --> O2["Find or create<br/>duo-workspace project"] --> O3["Delegate to<br/>ExecuteWorkflowService"] end subgraph EXEC["🏃 CI RUNNER"] CI["Agent executes in duo-workspace<br/><i>Tools · GitLab API · MCP · git clone</i>"] end subgraph CBGRP["📬 ASYNC CALLBACK"] CW["CallbackWorker<br/><i>Subscribes to WorkloadFinishedEvent</i>"] end subgraph WS["📁 duo-workspace — per top-level namespace"] direction LR W1["agent-config.yml<br/><i>image · scripts · cache</i>"] W2["AGENTS.md<br/><i>instructions</i>"] W3["CI/CD vars<br/><i>secrets · keys</i>"] W4["Runner tags<br/><i>dedicated runners</i>"] end Slack --> SA MTeams --> TA Others --> OA SA & TA & OA -->|"goal + callback_context"| ORCH O3 -->|"start pipeline"| CI CI -.->|"WorkloadFinishedEvent"| CW CW -.->|"deliver_result / on_flow_failed"| ADAPT O2 -.-|"creates / uses"| WS class Slack,MTeams,Others messaging class SA,TA,OA adapter class O1,O2,O3 orchestrator class CI execution class CW callback class W1,W2,W3,W4 workspace ``` **Solid arrows** = synchronous calls **Dashed arrows** = async events ### Request flow ```mermaid sequenceDiagram participant User participant Slack as Messaging Service participant Adapter participant Orchestrator participant CI as CI Runner participant Worker as CallbackWorker User->>Slack: @duo find open MRs for project X Slack->>Adapter: event rect rgb(209, 250, 229) Note right of Adapter: Trigger phase (sync) Adapter->>Orchestrator: trigger(goal, callback_context) Orchestrator->>Orchestrator: resolve namespace → workspace project Orchestrator-->>Adapter: success Adapter->>Slack: 👀 on_flow_started end rect rgb(237, 233, 254) Note right of CI: Execution phase (async) Orchestrator->>CI: start pipeline CI->>CI: Agent uses tools, APIs,<br/>git clone as needed end rect rgb(254, 249, 195) Note right of Worker: Callback phase (async) CI-->>Worker: WorkloadFinishedEvent Worker->>Worker: Extract answer from checkpoints Worker-->>Adapter: deliver_result Adapter->>Slack: Post answer in thread Adapter->>Slack: 👀 → ✅ on_flow_completed end ``` ### Key design choices **Agent flow via Flows API, delegating to `ExecuteWorkflowService`.** The orchestrator triggers an agent flow in the workspace project using the existing Flows API. It delegates to the same `ExecuteWorkflowService` used by the existing trigger paths, avoiding duplication of privilege handling, token generation, and workflow start logic. The messaging service passes the thread context as the goal. Initially this uses the same `developer/v1` flow that powers Duo Developer, giving the agent full capabilities (tools, GitLab API, MCP, git) from day one. **`duo-workspace` auto-created project.** A private, empty project per top-level namespace provides CI pipeline context. This is the path forward for the internal MVC. The exact project name (`duo-workspace`) is not final and can be iterated on in a follow-up. The workspace project is created at the **root namespace** of the user's `duo_default_namespace` — for example, if the user's default namespace is `gitlab-org/editor-extensions`, the workspace project is created at `gitlab-org/duo-workspace`, not `gitlab-org/editor-extensions/duo-workspace`. This keeps one workspace project per top-level group, avoiding proliferation of projects across nested namespaces. The workspace project is created when the admin enables the `developer/v1` flow for the namespace (using admin permissions), with a fallback find-or-create at trigger time for robustness. This avoids permission issues since regular users may not have `create_projects` access. Existing namespaces that already have `developer/v1` enabled before this ships will need a backfill migration (follow-up). Teams customize the workspace project (Docker image, AGENTS.md, skills, CI variables, runner tags) using existing project features. Follows the same pattern as Security Policy Projects. **Namespace resolution via `duo_default_namespace`.** No new configuration — reuses the existing user preference. The root namespace of this preference determines the top-level group for workspace project resolution. **`developer/v1` must be enabled.** The orchestrator validates upfront that the `developer/v1` foundational flow is enabled for the user's namespace. If not, messaging returns an actionable `:flow_not_enabled` error guiding the user to ask their admin to enable it. This early check avoids confusing downstream failures (e.g., "Could not resolve service account") and lets each adapter craft an appropriate user-facing message. **Adapter pattern.** Each messaging platform implements an adapter with lifecycle hooks (`deliver_result`, `deliver_error`, `on_flow_started`, `on_flow_completed`, `on_flow_failed`). The orchestrator, workspace project, and callback infrastructure are shared. **EventStore callback.** `CallbackWorker` subscribes to `WorkloadFinishedEvent`, checks for `messaging_callback_context` on the workflow record (JSONB column), and delivers results through the adapter. No GraphQL, no polling. The callback context contains adapter-specific delivery information, e.g. for Slack it could be something like: ```json { "adapter": "slack", "team_id": "T0123ABC", "channel_id": "C0123ABC", "thread_ts": "1234567890.123456", "message_ts": "1234567890.123456", "user_id": "U0123ABC" } ``` **Reuses the `developer/v1` catalog service account.** Messaging is a trigger mechanism for `developer/v1`, not a separate flow. The service account identity reflects the flow being executed, not the trigger source. The orchestrator resolves the existing SA created when an admin enables the Developer flow for the namespace. No separate messaging SA is created. If `developer/v1` is not enabled, there is no SA, and messaging returns a clear error. The SA uses `composite_identity_enforced: true` — the same security model used by Duo Developer and other agent platform flows. Effective permissions are the intersection of the triggering user's and the service account's access. ### Path to streaming and human approval The architecture extends to real-time progress and interactive features without changing the core design: ```mermaid sequenceDiagram participant CI as CI Runner participant Rails as Rails participant CW as CheckpointCallbackWorker participant Adapter as Messaging Adapter participant Slack as Slack participant User as User CI->>Rails: Save checkpoint Rails-->>CW: CheckpointCreatedEvent (via EventStore) CW->>Adapter: on_checkpoint_created(context, diff) Adapter->>Slack: Status update ("Searching issues...") Note over CI,Slack: When approval is required: CI->>Rails: Save checkpoint (approval_required) Rails-->>CW: CheckpointCreatedEvent CW->>Adapter: on_approval_requested(context, details) Adapter->>Slack: Interactive message (Approve / Reject) User->>Slack: Clicks "Approve" Slack->>Rails: Interaction payload Rails->>Rails: Write approval → resume workflow ``` A new `CheckpointCallbackWorker` subscribes to a `WorkflowCheckpointCreatedEvent` — separate from `CallbackWorker` because checkpoint events have different characteristics (high frequency, different retry semantics). Each step is event-driven; no persistent connections are needed. Approval state is persisted on the workflow record and the flow can be stopped and restarted. ### Adapter interface The v1 adapter only needs two required methods. All other hooks are optional with no-op defaults in the base class, added when the corresponding infrastructure is built. | Method | Purpose | Called by | Required? | |---|---|---|---| | `deliver_result` | Post the final answer | `CallbackWorker` | Yes | | `deliver_error` | Post an error message | `CallbackWorker` | Yes | | `on_flow_started` | Signal work started (e.g., 👀) | Trigger service | Optional | | `on_flow_completed` | Signal work done (e.g., ✅) | `CallbackWorker` | Optional | | `on_flow_failed` | Signal failure (e.g., ❌ + error) | Both | Optional | | `on_checkpoint_created` | Intermediate progress update | `CheckpointCallbackWorker` | Optional (future) | | `on_approval_requested` | Post approval prompt | `CheckpointCallbackWorker` | Optional (future) | ### Responsibility split: pre-flow checks vs adapter lifecycle Platform-specific pre-flight checks (authentication, authorization, feature flags, license validation) remain in the entry-point service (e.g., `AppMentionedService` for Slack). These happen before Duo is involved and may require platform-specific responses (e.g., an OAuth authorization link for unlinked Slack users). The adapter handles flow lifecycle only: `on_flow_started`, `on_flow_completed`, `on_flow_failed`, `deliver_result`, `deliver_error`. This keeps adapter implementations focused on delivery mechanics rather than auth logic. ### Startup time | Step | Today (large project) | With duo-workspace | |---|---|---| | Git clone | Seconds–minutes | Near-instant (empty repo) | | Docker image | Default, pulled each time | Custom via `agent-config.yml`, cached | | `duo-cli` install | `npm install` each run (~15s) | Pre-baked into custom image | Prototyping showed end-to-end response times under 10 seconds with an empty workspace project. This is acceptable for async messaging. Teams optimize further by customizing the workspace project (cached images, dedicated runners, pre-installed tools). ## Pros - Battle-tested CI/Flows infrastructure — no new execution runtime - No Workhorse or DWS changes required - ADR 004 compliant - Every CI improvement benefits messaging for free - Adapter pattern cleanly separates platform-specific concerns - Workspace project is a natural customization surface (image, skills, secrets) - Streaming and human approval extend the same architecture additively (new EventStore subscriptions, new adapter hooks — no core changes) ## Cons - CI startup latency (~10s with empty project) is slower than a direct service call, though acceptable for async messaging - Auto-creating projects and service accounts adds implicit resources to namespaces - Adapter hooks are invoked from different call sites (trigger service vs. callback worker) — requires clear documentation for new adapter authors ## Implementation - [Issue](https://gitlab.com/gitlab-org/gitlab/-/work_items/590434) ### Feature flag The entire flow is gated behind the [`slack_duo_agent`](https://gitlab.com/gitlab-org/gitlab/-/work_items/592185) feature flag (per-user), which already gates the `AppMentionedService`. Loading
content/handbook/engineering/architecture/design-documents/duo_workflow/_index.md +23 −0 Original line number Diff line number Diff line Loading @@ -209,6 +209,29 @@ our executors and the Duo Workflow Service and therefore remove the need for our executors to proxy requests to the GitLab instance for self-managed as documented below. #### From messaging services (Slack, Teams, etc.) The Duo Messaging Service allows users to trigger workflows from external messaging platforms by @mentioning Duo. It uses the same CI pipeline execution path as remote workflows but with a different trigger mechanism: 1. User @mentions Duo in a messaging service (e.g., Slack) 2. The messaging service sends an event to GitLab Rails 3. A messaging adapter translates the event into a goal and callback context 4. The orchestrator resolves the user's `duo_default_namespace`, finds or creates a `duo-workspace` project, and triggers a `developer/v1` flow 5. The agent runs in CI with the same composite identity as Duo Developer 6. When the workflow completes, a `CallbackWorker` (subscribed to `WorkloadFinishedEvent` via EventStore) delivers the result back through the adapter to the messaging service The adapter pattern allows adding new messaging platforms by implementing a small interface (~5 methods) without changing the orchestration or execution infrastructure. For the full architecture, see [ADR 008: Duo Messaging Service](decisions/008_duo_messaging_service.md). ### Self-managed architecture #### With local Workflow service Loading
content/handbook/engineering/architecture/design-documents/duo_workflow/decisions/008_duo_messaging_service.md 0 → 100644 +354 −0 Original line number Diff line number Diff line --- title: "Duo Agent Platform ADR 008: Duo Messaging Service" status: proposed creation-date: "2026-04-17" authors: [ "@thomas-schmidt" ] coach: [ ] approvers: [ ] owning-stage: "~devops::ai_powered" participating-stages: [] toc_hide: true --- ## Context We want users to interact with Duo from external messaging services — starting with Slack, then Microsoft Teams, WhatsApp, Telegram, and others. A user @mentions Duo, gives it a task, and Duo works on it asynchronously and posts back the result. Two challenges are specific to messaging: 1. CI pipelines require a project, but messaging services have no project context 2. Multiple messaging platforms need to be supported without duplicating orchestration logic ### Alternatives considered Five approaches were investigated: 1. **CI job (Flows API)** — Trigger a CI pipeline via the existing Flows infrastructure. Battle-tested, ADR 004 compliant, no Workhorse or DWS changes. The only approach that provides a real execution environment — the agent can git clone, run tests, install tools, and do full development tasks. Downside: CI startup latency (~10s with empty project). Requires a project for the pipeline — solved by auto-creating a workspace project. 2. **WebSocket blocking** — Sidekiq worker opens a WebSocket to Workhorse, keeps it open for the full workflow duration. Simple, supports streaming. Downside: blocks a Sidekiq thread for up to 5 minutes per request, limiting throughput to ~50 concurrent workflows per Sidekiq process. No execution environment — the agent runs inside Workhorse with no filesystem, no git, no ability to run commands. Limits the agent to read-only API interactions with no path to development tasks. 3. **WebSocket fire-and-forget** — Sidekiq opens WebSocket, sends start request, disconnects immediately. **Blocked**: prototyping revealed Workhorse terminates the workflow when the client disconnects (sends `StopWorkflow` on clean close, tears down gRPC on abnormal close). Would require Workhorse changes to add a headless/detached mode. Same execution environment limitation as option 2. 4. **Direct gRPC** — Sidekiq opens a gRPC bidi stream directly to DWS. Lower latency, type-safe. **Violates ADR 004** (introduces a second path to DWS). Must reimplement HTTP action proxying in Ruby. No established pattern for gRPC bidi streaming from Sidekiq in the codebase. Same execution environment limitation — no filesystem or tooling available. 5. **Workhorse headless HTTP** — New Workhorse endpoint that accepts a workflow trigger via HTTP POST, manages the gRPC stream internally. **Requires cross-team Workhorse changes** (~50-100 lines of Go) and a modified runner lifecycle. Same execution environment limitation as options 2-4 — no path to development tasks without additional architecture. ## Decision Use the **Flows API (CI job)** approach with an **adapter pattern** for multi-platform support and a **per-namespace workspace project** to provide CI context. ### Architecture ```mermaid graph TB classDef messaging fill:#dbeafe,stroke:#93c5fd,color:#1e3a5f classDef adapter fill:#d1fae5,stroke:#6ee7b7,color:#065f46 classDef orchestrator fill:#ffedd5,stroke:#fdba74,color:#7c2d12 classDef execution fill:#ede9fe,stroke:#c4b5fd,color:#3b0764 classDef callback fill:#fef9c3,stroke:#fde047,color:#713f12 classDef workspace fill:#f3e8ff,stroke:#d8b4fe,color:#581c87 subgraph MSG["💬 MESSAGING SERVICES"] Slack(["Slack"]) MTeams(["Microsoft Teams"]) Others(["WhatsApp · Telegram · ..."]) end subgraph ADAPT["🔌 ADAPTERS — one per messaging service"] direction LR SA["Slack Adapter<br/>👀 ✅ ❌"] TA["Teams Adapter"] OA["..."] end subgraph ORCH["⚙️ ORCHESTRATOR"] direction LR O1["Resolve user's<br/>duo_default_namespace<br/>(root namespace)"] --> O2["Find or create<br/>duo-workspace project"] --> O3["Delegate to<br/>ExecuteWorkflowService"] end subgraph EXEC["🏃 CI RUNNER"] CI["Agent executes in duo-workspace<br/><i>Tools · GitLab API · MCP · git clone</i>"] end subgraph CBGRP["📬 ASYNC CALLBACK"] CW["CallbackWorker<br/><i>Subscribes to WorkloadFinishedEvent</i>"] end subgraph WS["📁 duo-workspace — per top-level namespace"] direction LR W1["agent-config.yml<br/><i>image · scripts · cache</i>"] W2["AGENTS.md<br/><i>instructions</i>"] W3["CI/CD vars<br/><i>secrets · keys</i>"] W4["Runner tags<br/><i>dedicated runners</i>"] end Slack --> SA MTeams --> TA Others --> OA SA & TA & OA -->|"goal + callback_context"| ORCH O3 -->|"start pipeline"| CI CI -.->|"WorkloadFinishedEvent"| CW CW -.->|"deliver_result / on_flow_failed"| ADAPT O2 -.-|"creates / uses"| WS class Slack,MTeams,Others messaging class SA,TA,OA adapter class O1,O2,O3 orchestrator class CI execution class CW callback class W1,W2,W3,W4 workspace ``` **Solid arrows** = synchronous calls **Dashed arrows** = async events ### Request flow ```mermaid sequenceDiagram participant User participant Slack as Messaging Service participant Adapter participant Orchestrator participant CI as CI Runner participant Worker as CallbackWorker User->>Slack: @duo find open MRs for project X Slack->>Adapter: event rect rgb(209, 250, 229) Note right of Adapter: Trigger phase (sync) Adapter->>Orchestrator: trigger(goal, callback_context) Orchestrator->>Orchestrator: resolve namespace → workspace project Orchestrator-->>Adapter: success Adapter->>Slack: 👀 on_flow_started end rect rgb(237, 233, 254) Note right of CI: Execution phase (async) Orchestrator->>CI: start pipeline CI->>CI: Agent uses tools, APIs,<br/>git clone as needed end rect rgb(254, 249, 195) Note right of Worker: Callback phase (async) CI-->>Worker: WorkloadFinishedEvent Worker->>Worker: Extract answer from checkpoints Worker-->>Adapter: deliver_result Adapter->>Slack: Post answer in thread Adapter->>Slack: 👀 → ✅ on_flow_completed end ``` ### Key design choices **Agent flow via Flows API, delegating to `ExecuteWorkflowService`.** The orchestrator triggers an agent flow in the workspace project using the existing Flows API. It delegates to the same `ExecuteWorkflowService` used by the existing trigger paths, avoiding duplication of privilege handling, token generation, and workflow start logic. The messaging service passes the thread context as the goal. Initially this uses the same `developer/v1` flow that powers Duo Developer, giving the agent full capabilities (tools, GitLab API, MCP, git) from day one. **`duo-workspace` auto-created project.** A private, empty project per top-level namespace provides CI pipeline context. This is the path forward for the internal MVC. The exact project name (`duo-workspace`) is not final and can be iterated on in a follow-up. The workspace project is created at the **root namespace** of the user's `duo_default_namespace` — for example, if the user's default namespace is `gitlab-org/editor-extensions`, the workspace project is created at `gitlab-org/duo-workspace`, not `gitlab-org/editor-extensions/duo-workspace`. This keeps one workspace project per top-level group, avoiding proliferation of projects across nested namespaces. The workspace project is created when the admin enables the `developer/v1` flow for the namespace (using admin permissions), with a fallback find-or-create at trigger time for robustness. This avoids permission issues since regular users may not have `create_projects` access. Existing namespaces that already have `developer/v1` enabled before this ships will need a backfill migration (follow-up). Teams customize the workspace project (Docker image, AGENTS.md, skills, CI variables, runner tags) using existing project features. Follows the same pattern as Security Policy Projects. **Namespace resolution via `duo_default_namespace`.** No new configuration — reuses the existing user preference. The root namespace of this preference determines the top-level group for workspace project resolution. **`developer/v1` must be enabled.** The orchestrator validates upfront that the `developer/v1` foundational flow is enabled for the user's namespace. If not, messaging returns an actionable `:flow_not_enabled` error guiding the user to ask their admin to enable it. This early check avoids confusing downstream failures (e.g., "Could not resolve service account") and lets each adapter craft an appropriate user-facing message. **Adapter pattern.** Each messaging platform implements an adapter with lifecycle hooks (`deliver_result`, `deliver_error`, `on_flow_started`, `on_flow_completed`, `on_flow_failed`). The orchestrator, workspace project, and callback infrastructure are shared. **EventStore callback.** `CallbackWorker` subscribes to `WorkloadFinishedEvent`, checks for `messaging_callback_context` on the workflow record (JSONB column), and delivers results through the adapter. No GraphQL, no polling. The callback context contains adapter-specific delivery information, e.g. for Slack it could be something like: ```json { "adapter": "slack", "team_id": "T0123ABC", "channel_id": "C0123ABC", "thread_ts": "1234567890.123456", "message_ts": "1234567890.123456", "user_id": "U0123ABC" } ``` **Reuses the `developer/v1` catalog service account.** Messaging is a trigger mechanism for `developer/v1`, not a separate flow. The service account identity reflects the flow being executed, not the trigger source. The orchestrator resolves the existing SA created when an admin enables the Developer flow for the namespace. No separate messaging SA is created. If `developer/v1` is not enabled, there is no SA, and messaging returns a clear error. The SA uses `composite_identity_enforced: true` — the same security model used by Duo Developer and other agent platform flows. Effective permissions are the intersection of the triggering user's and the service account's access. ### Path to streaming and human approval The architecture extends to real-time progress and interactive features without changing the core design: ```mermaid sequenceDiagram participant CI as CI Runner participant Rails as Rails participant CW as CheckpointCallbackWorker participant Adapter as Messaging Adapter participant Slack as Slack participant User as User CI->>Rails: Save checkpoint Rails-->>CW: CheckpointCreatedEvent (via EventStore) CW->>Adapter: on_checkpoint_created(context, diff) Adapter->>Slack: Status update ("Searching issues...") Note over CI,Slack: When approval is required: CI->>Rails: Save checkpoint (approval_required) Rails-->>CW: CheckpointCreatedEvent CW->>Adapter: on_approval_requested(context, details) Adapter->>Slack: Interactive message (Approve / Reject) User->>Slack: Clicks "Approve" Slack->>Rails: Interaction payload Rails->>Rails: Write approval → resume workflow ``` A new `CheckpointCallbackWorker` subscribes to a `WorkflowCheckpointCreatedEvent` — separate from `CallbackWorker` because checkpoint events have different characteristics (high frequency, different retry semantics). Each step is event-driven; no persistent connections are needed. Approval state is persisted on the workflow record and the flow can be stopped and restarted. ### Adapter interface The v1 adapter only needs two required methods. All other hooks are optional with no-op defaults in the base class, added when the corresponding infrastructure is built. | Method | Purpose | Called by | Required? | |---|---|---|---| | `deliver_result` | Post the final answer | `CallbackWorker` | Yes | | `deliver_error` | Post an error message | `CallbackWorker` | Yes | | `on_flow_started` | Signal work started (e.g., 👀) | Trigger service | Optional | | `on_flow_completed` | Signal work done (e.g., ✅) | `CallbackWorker` | Optional | | `on_flow_failed` | Signal failure (e.g., ❌ + error) | Both | Optional | | `on_checkpoint_created` | Intermediate progress update | `CheckpointCallbackWorker` | Optional (future) | | `on_approval_requested` | Post approval prompt | `CheckpointCallbackWorker` | Optional (future) | ### Responsibility split: pre-flow checks vs adapter lifecycle Platform-specific pre-flight checks (authentication, authorization, feature flags, license validation) remain in the entry-point service (e.g., `AppMentionedService` for Slack). These happen before Duo is involved and may require platform-specific responses (e.g., an OAuth authorization link for unlinked Slack users). The adapter handles flow lifecycle only: `on_flow_started`, `on_flow_completed`, `on_flow_failed`, `deliver_result`, `deliver_error`. This keeps adapter implementations focused on delivery mechanics rather than auth logic. ### Startup time | Step | Today (large project) | With duo-workspace | |---|---|---| | Git clone | Seconds–minutes | Near-instant (empty repo) | | Docker image | Default, pulled each time | Custom via `agent-config.yml`, cached | | `duo-cli` install | `npm install` each run (~15s) | Pre-baked into custom image | Prototyping showed end-to-end response times under 10 seconds with an empty workspace project. This is acceptable for async messaging. Teams optimize further by customizing the workspace project (cached images, dedicated runners, pre-installed tools). ## Pros - Battle-tested CI/Flows infrastructure — no new execution runtime - No Workhorse or DWS changes required - ADR 004 compliant - Every CI improvement benefits messaging for free - Adapter pattern cleanly separates platform-specific concerns - Workspace project is a natural customization surface (image, skills, secrets) - Streaming and human approval extend the same architecture additively (new EventStore subscriptions, new adapter hooks — no core changes) ## Cons - CI startup latency (~10s with empty project) is slower than a direct service call, though acceptable for async messaging - Auto-creating projects and service accounts adds implicit resources to namespaces - Adapter hooks are invoked from different call sites (trigger service vs. callback worker) — requires clear documentation for new adapter authors ## Implementation - [Issue](https://gitlab.com/gitlab-org/gitlab/-/work_items/590434) ### Feature flag The entire flow is gated behind the [`slack_duo_agent`](https://gitlab.com/gitlab-org/gitlab/-/work_items/592185) feature flag (per-user), which already gates the `AppMentionedService`.