Add eval pipeline for agentic Vulnerability Resolution
Problem to solve
The Sec AI team is developing a new agentic vulnerability resolution feature using the agentic platform. This feature uses AI to automatically identify, remediate, and prioritize security vulnerabilities with minimal human intervention.
Agentic VR Architecture:
For details on the agentic workflow:
- https://gitlab.com/gitlab-org/gitlab/-/issues/556989+
- feat: add flow for SAST vulnerability resolution (gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!3171 - merged)
Overview
- Input
- Vulnerability ID
- Output
- MR link
- MR readiness score
We need to build a new evaluation pipeline specifically for this agentic approach that builds upon the existing VR evaluation framework.
Proposal
Build a new evaluation pipeline for agentic vulnerability resolution that:
- Uses the existing VR evaluation framework as a foundation
- Adds new metrics specific to the agentic approach
Evaluation Requirements
Evaluation Methodology
Dataset:
- Use existing vulnerability dataset from https://staging.gitlab.com/ai-evaluation/etv
- Leverage existing VR evaluation infrastructure
Foundation Framework:
- Reuse existing LLM-Judge from legacy Duo VR evaluation
-
Reuse existing evaluation criteria:
- Is vulnerability fixed?
- Does it introduce a new vulnerability?
- Is syntax correct?
- Does it preserve original functionality?
-
New Agentic-Specific Metrics:
- MR readiness detection score
Technical Details
API Usage for Agentic VR:
curl -X POST \
-H "Authorization: Bearer $GDK_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"project_id": "26",
"agent_privileges": [1, 2, 3, 4, 5],
"goal": "Fix vulnerability ID: 772",
"start_workflow": true,
"workflow_definition": "resolve_sast_vulnerability/experimental",
"environment": "web",
"source_branch": "security/sast/resolve-vulnerability-772"
}' \
http://gdk.test:3000/api/v4/ai/duo_workflows/workflows
Links / references
- Primary evaluation requirements: https://gitlab.com/gitlab-org/gitlab/-/issues/553304+
- agentic VR implementation MR: feat: add flow for SAST vulnerability resolution (gitlab-org/modelops/applied-ml/code-suggestions/ai-assist!3171 - merged)
- Existing VR evaluation foundation: https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/blob/main/cef/vulnerability_resolution/evaluators/resolution_quality.py
Edited by Nate Rosandich