Add eval pipeline for agentic Vulnerability Resolution

Problem to solve

The Sec AI team is developing a new agentic vulnerability resolution feature using the agentic platform. This feature uses AI to automatically identify, remediate, and prioritize security vulnerabilities with minimal human intervention.

Agentic VR Architecture:

For details on the agentic workflow:

Overview

Input
- Vulnerability ID
Output
- MR link
- MR readiness score

We need to build a new evaluation pipeline specifically for this agentic approach that builds upon the existing VR evaluation framework.

Proposal

Build a new evaluation pipeline for agentic vulnerability resolution that:

Uses the existing VR evaluation framework as a foundation
Adds new metrics specific to the agentic approach

Evaluation Requirements

Evaluation Methodology

Dataset:

Use existing vulnerability dataset from https://staging.gitlab.com/ai-evaluation/etv
Leverage existing VR evaluation infrastructure

Foundation Framework:

Reuse existing LLM-Judge from legacy Duo VR evaluation
Reuse existing evaluation criteria:
- Is vulnerability fixed?
- Does it introduce a new vulnerability?
- Is syntax correct?
- Does it preserve original functionality?
New Agentic-Specific Metrics:
- MR readiness detection score

Technical Details

API Usage for Agentic VR:

curl -X POST \
    -H "Authorization: Bearer $GDK_API_TOKEN" \
    -H "Content-Type: application/json" \
    -d '{
        "project_id": "26",
        "agent_privileges": [1, 2, 3, 4, 5],
        "goal": "Fix vulnerability ID: 772",
        "start_workflow": true,
        "workflow_definition": "resolve_sast_vulnerability/experimental",
        "environment": "web",
        "source_branch": "security/sast/resolve-vulnerability-772"
    }' \
    http://gdk.test:3000/api/v4/ai/duo_workflows/workflows

Links / references

Primary evaluation requirements: https://gitlab.com/gitlab-org/gitlab/-/issues/553304+
agentic VR implementation MR: feat: add flow for SAST vulnerability resolution (gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!3171 - merged)
Existing VR evaluation foundation: https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/blob/main/cef/vulnerability_resolution/evaluators/resolution_quality.py

Edited Oct 06, 2025 by Nate Rosandich