Save failure reason to summary

What does this MR do and why?

When a Duo Agent Platform (Remote Flow) session fails at the runner level, the failure reason is now captured and saved to the workflow's summary field. This makes it easier to surface runner-level errors to users without requiring them to dig through job logs.

Previously, the summary field (introduced in !231554 (merged)) was left empty on failure. Now, when a pipeline build fails, the worker reads the build's failure_reason and passes it as a human-readable summary (e.g. "Error during Session: runner_system_failure") when transitioning the workflow to a failed state.

Followups

  • An LLM-generated summary of error logs will be added in a follow-up.
  • The summary field will also be used for LLM-generated summaries of successful flows.

References

#594334

How to set up and validate locally

  1. Set up Duo Agent Platform
  2. Set up a failure for your runner. An easy way is to stop your docker daemon, e.g. if you use colima: colima stop
  3. Run a Remote Flow (e.g. issue to MR)
  4. Confirm the job log has an error and the flow didnt run
  5. After failure, check the workflow summary field in http://gdk.test:3000/-/graphql-explorer
query getDuoWorkflowEvents($workflowId: AiDuoWorkflowsWorkflowID!) {
  duoWorkflowEvents(workflowId: $workflowId) {
    nodes {
      errors
      metadata
      workflowGoal
      workflowStatus
    }
  }
  duoWorkflowWorkflows(workflowId: $workflowId) {
    nodes {
      id
      status
      aiCatalogItemVersionId
      workflowDefinition
      archived
      summary
    }
  }
}

Confirm you see a summary field:

{
  "data": {
    "duoWorkflowEvents": {
      "nodes": []
    },
    "duoWorkflowWorkflows": {
      "nodes": [
        {
          "id": "gid://gitlab/Ai::DuoWorkflows::Workflow/4055",
          "status": "FAILED",
          "aiCatalogItemVersionId": "gid://gitlab/Ai::Catalog::ItemVersion/6",
          "workflowDefinition": "developer/v1",
          "archived": false,
          "summary": "Error during Session: runner_system_failure"
        }
      ]
    }
  },
  "correlationId": "01KPY6Q05BY99VBKH91Q3KHCST"
}

Or, from the rails console:

[2] pry(main)> Ai::DuoWorkflows::Workflow.find(4055).summary
  Ai::DuoWorkflows::Workflow Load (1.1ms)  SELECT "duo_workflows_workflows".* FROM "duo_workflows_workflows" WHERE "duo_workflows_workflows"."id" = 4055 LIMIT 1 /*application:console,db_config_database:gitlabhq_development,db_config_name:main,console_hostname:reisner--20250227-0XX53,console_username:reisner,line:(pry):2:in `__pry__'*/                                                                        
=> "Error during Session: runner_system_failure"

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Roman Eisner

Merge request reports

Loading