Skip to content

Add eval pipeline for agentic Vulnerability Resolution

Problem to solve

The Sec AI team is developing a new agentic vulnerability resolution feature using the agentic platform. This feature uses AI to automatically identify, remediate, and prioritize security vulnerabilities with minimal human intervention.

Agentic VR Architecture:

For details on the agentic workflow:

Overview

  • Input
    • Vulnerability ID
  • Output
    • MR link
    • MR readiness score

We need to build a new evaluation pipeline specifically for this agentic approach that builds upon the existing VR evaluation framework.

Proposal

Build a new evaluation pipeline for agentic vulnerability resolution that:

  1. Uses the existing VR evaluation framework as a foundation
  2. Adds new metrics specific to the agentic approach

Evaluation Requirements

Evaluation Methodology

Dataset:

Foundation Framework:

  • Reuse existing LLM-Judge from legacy Duo VR evaluation
  • Reuse existing evaluation criteria:
    • Is vulnerability fixed?
    • Does it introduce a new vulnerability?
    • Is syntax correct?
    • Does it preserve original functionality?
  • New Agentic-Specific Metrics:
    • MR readiness detection score

Technical Details

API Usage for Agentic VR:

curl -X POST \
    -H "Authorization: Bearer $GDK_API_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{
        "project_id": "26",
        "agent_privileges": [1, 2, 3, 4, 5],
        "goal": "Fix vulnerability ID: 772",
        "start_workflow": true,
        "workflow_definition": "resolve_sast_vulnerability/experimental",
        "environment": "web",
        "source_branch": "security/sast/resolve-vulnerability-772"
    }' \
    http://gdk.test:3000/api/v4/ai/duo_workflows/workflows

Links / references

Edited by Nate Rosandich