Duo Workflow Human in the loop design

Human in the loop design

Background

Large Language Models (LLMs) are not perfect and many a times can start executing a series of calls that may not be solving the problem at hand. In such cases we need a human to steer the conversation. These are the scenarios in which we need to add human intervention to Duo Workflows

Stopping a workflow: If the task is complete or no further actions are required.
Pausing a workflow: When human input is necessary before handing over to the agent, such as installing software, modifying files, or adjusting configurations.
Seeking approval for critical actions: The agent requires human approval before executing potentially destructive tasks (e.g., deleting files, creating issues).
Clarifying objectives: The agent may need clarification on the task or the environment.
Human guidance during execution: The human may need to steer the agent mid-process, especially during complex updates.
Reverting back a change: On occasion a human might want to undo a change made by the AI.

Note this design does not cover reverting back a change and will be covered in future designs

Proposal

We propose adding a "human intervention" node within the workflow to handle human inputs. This node will evaluate whether human input is needed and adjust the workflow accordingly—either pausing, stopping, or allowing the agent to continue.

sequenceDiagram
    VSCodeUI->>LSP: User inputs chat message
    LSP->>Rails: POST to workflow events
    Rails->>Rails: Store Event
    DuoWorkflowService->>DuoWorkflowService: Human Intervention Node
    DuoWorkflowService->>DWExecutor: MakeHTTPRequest
    DWExecutor->>Rails: Fetch Human Events
    Rails-->>DWExecutor: Human events
    DWExecutor->>DuoWorkflowService: Action Response
    alt events present
        DuoWorkflowService->>DuoWorkflowService: Pause/Stop/Back to agent
    else No events
        DuoWorkflowService->>DuoWorkflowService: Continue execution
    end

Workflow with Human Intervention

The following diagram illustrates the workflow with the human intervention nodes added:

stateDiagram-v2
    [*] --> Planner
    Planner --> Executor

    state Planner {
        [*] --> Agent1
        Agent1 --> Human_Intervention_Check_1
        state Human_Intervention_Check_1 <<choice>>
        Human_Intervention_Check_1 --> Agent1_Tools: if tools called
        Human_Intervention_Check_1 --> Reflection_1: if tools not called
        Human_Intervention_Check_1  --> Agent1: if human event present
        Human_Intervention_Check_1 --> Pause_1: if paused
        Human_Intervention_Check_1 --> [*]: if end
        Agent1_Tools --> Agent1
        Reflection_1 --> Agent1
        Agent1 --> Handover_1
        Handover_1 --> [*]
    }
    
    state Executor {
        Agent2 --> Human_Intervention_Check_2
        state Human_Intervention_Check_2 <<choice>>
        Human_Intervention_Check_2 --> Agent2_Tools: if tools called
        Human_Intervention_Check_2 --> Reflection_2: if tools not called
        Human_Intervention_Check_2  --> Agent_2: if human event present
        Human_Intervention_Check_2 --> Pause_2: if paused
        Human_Intervention_Check_2 --> [*]: if end
        Agent2_Tools --> Agent2
        Reflection_2 --> Agent2
        Agent2 --> Handover_2
        Handover_2 --> [*]
    }

Human Interaction Tool

Additionally, we will implement a tool that the LLM can call to request human input. This tool will block execution until a human provides a response (e.g., approve, decline, or clarify). The UI will monitor checkpoints to identify when human input is required.

sequenceDiagram
    DuoWorkflowService->>DWExecutor: Update checkpoint with human question
    DWExecutor->>Rails: Update checkpoint with human input needed
    LSP->>Rails: Fetch checkpoints
    LSP->>VSCodeUI: Display question
    Human->>VSCodeUI: Answer question
    VSCodeUI->>LSP: Respond to question
    LSP->>Rails: POST to events API
    DuoWorkflowService->>DuoWorkflowService: Call get_user_input tool
    loop until event found
        DuoWorkflowService->>DWExecutor: MakeHTTPRequest
        DWExecutor->>Rails: Fetch Human Events
        Rails-->>DWExecutor: Human events
        DWExecutor->>DuoWorkflowService: Action Response
    end
    DuoWorkflowService->>DuoWorkflowService: Update state

The human intervention node will make an API call to Rails to check for human events, such as Pause, Stop, Approve, Response, Resume or Message. The meaning of each event is as follows:

Pause/Stop/Resume/Approve: The workflow pauses, stops, or seeks approval for a specific action.

Response: A human reply to a question asked by the agent.

Message: A general input from the user that can alter the agent's course of action.

Human Events Model

We will create a table + model + api in rails that will contain the following

erDiagram
    Workflow ||--o{ Checkpoint : has
    Workflow ||--o{ HumanEvent: has
    HumanEvent {
        int workflowID
        enum eventType
        string message
        bool approved
        enum eventStatus
    }

The eventStatus field will track the event's lifecycle through the following stages:

stateDiagram-v2
    [*] --> Queued
    Queued --> Processed
    Processed --> [*]

When a human event (such as a message, pause, or stop) is received, the workflow state is updated accordingly. For example:

If a Stop event is received, the workflow ends. If a Pause event is received, the workflow enters a paused state and will move back to Human Intervention if a resume event is received. If a Message event is received, a new user message is added to the conversation.

Edited Oct 28, 2024 by Shekhar Patnaik