Vulnerability Resolution - Iterate on the Prompt - Short-Loop Evaluation

Preferred option - Running `prompt-library` on GDK

Implementation plan:

Seed the GDK with projects including vulnerabilities (extend GitLab Direct Transfer)
Export GDK vulnerabilities to JSONL format
- Wait for feat: move vulnerability extraction to Prompt L... (gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library!827 - merged) • Andras Herczeg • 17.6 to be merged (export from GitLab to v4)
- Wait for additional extraction steps (from v4 to v7)
- See https://gitlab.com/gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library/-/work_items/378#note_2185995609
Filter JSONL (to select only a few subset, by CWE, by language, ...)
Run Prompt Library (input = JSONL file, output = JSONL file)

Original Description

We are close to having CEF in place for Vulnerability Resolution, which allows us to assess the feature's quality.
This assessment provides detailed insights into the feature's performance.

Our next goal is to enhance the feature's quality by improving the CEF indicators.
These improvements will primarily involve modifications to the prompt.

When adjusting the prompt, we must ensure that no significant regressions are introduced.
Ideally, we would like to run the evaluation on the branch before merging.
However running the evaluation on the whole dataset takes too much time (~48 hours).

The goal of this issue is the set up a process for evaluating changes to the prompt before merging.

Several options have been considered:

Simulation-based LLM Judge in prompt-library
Evaluation in LangSmith
Running prompt-library on GDK

Also, the CEF will be able to pinpoint data point that need particular attention.
We need a way to run a local evaluation on those particular data points.

Notice: a particularity of the VR LLM Judge is that it's not judging directly the output of the LLM, but judging the output of VR, including the creation of the MR (applying the suggestion of the LLM on the actual code).

Edited Oct 30, 2024 by Meir Benayoun

Vulnerability Resolution - Iterate on the Prompt - Short-Loop Evaluation

Preferred option - Running prompt-library on GDK

Original Description

Preferred option - Running `prompt-library` on GDK