Draft: Add eval pipeline for Agentic Vulnerability Resolution: inference (1/2)
What does this merge request do and why?
This MR creates a new evaluation pipeline for agentic vulnerability resolution. The Sec AI team has released a new experimental feature that automatically resolves vulnerabilities using an agentic approach, but currently there is no evaluation pipeline to assess its performance. This MR provides the infrastructure to capture predictions and make them available for further evaluation.
Scope clarification
This MR is part 1 of 2:
- Part 1 (this MR): Collect predictions (inference) from the feature.
- Part 2 (upcoming): Evaluate and assess the quality of those predictions.
How to set up and validate locally
-
Perform these setup steps.
-
Set up the executor in your own laptop. Specify the path to your executor in
.gitlab/agent_platform_templates/vulnerability_resolution.yml
. -
Open your
.env
file and set:-
GITLAB_BASE_URL
tohttp://gdk-for-eval-dbecaae2.gitlab-evaluation-runner.com:3000
-
GITLAB_PRIVATE_TOKEN
to the GitLab private token defined in this line
Note: please verify that the following variables are also set:
LANGCHAIN_API_KEY
,LANGCHAIN_PROJECT
,ANTHROPIC_API_KEY
. -
-
Run the evaluation command as follows:
poetry run cef agent-platform evaluate .gitlab/agent_platform_templates/vulnerability_resolution.yaml
The command outputs a LangSmith experiment link where you can view the predictions (code patches) for each vulnerability in the dataset. See this experiment's result as example.
If you open an example in the LangSmith experiment, you should observe the code patch (i.e. the agent's prediction) as follows:
Important note 1: The workflow may be unable to generate code patches for certain vulnerabilities. When patch generation fails, you'll see this specific log message:
No fix branch created for vuln_id=XXX, workflow_id=XX (likely no patch generated).
Important note 2: This MR uses the LangSmith dataset vulnerability.resolution.3.subset
, which is configured in the .yaml
input file.
References
- Parent issue: #791
Merge request checklist
-
I've ran the affected pipeline(s) to validate that nothing is broken. -
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.