fix: Add explicit output format constraints to SAST FP detection prompts

What does this merge request do and why?

This MR fixes a response format regression in the SAST false positive detection flow where the agent was wrapping JSON responses in explanatory text instead of returning raw JSON.

Problem

Previously, the agent returned:

Based on my comprehensive analysis of the SAST finding, here is my assessment:

\`\`\`json
{
  "false_positive_likelihood": 95,
  "explanation": "..."
}
\`\`\`

Now it should return:

{
  "false_positive_likelihood": 95,
  "explanation": "..."
}

Root Cause

The tool_output_security directive was duplicated in the system prompt, and the security emphasis on careful output handling caused the model to add explanatory preamble to the JSON response.

Changes

  1. System Prompt (sast_fp_detection_agent_prompt/system/1.0.0.jinja):

    • Removed duplicate tool_output_security directive (kept only one)
  2. User Prompt (sast_fp_detection_agent_prompt/user/1.0.0.jinja):

    • Added explicit CRITICAL OUTPUT FORMAT REQUIREMENT section at the beginning
    • Specifies that agent MUST output ONLY valid JSON with no explanatory text, preamble, or markdown formatting
    • Clarifies that JSON should be output directly without code blocks or markdown

Testing locally

curl -X POST \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d "{
\"project_id\": \"PROJECT_ID\",
\"agent_privileges\": [1, 2, 3, 4, 5],
\"pre_approved_agent_privileges\": [1, 2, 3, 4, 5],
\"goal\": \"VULNERABILITY_ID\",
\"start_workflow\": true,
\"workflow_definition\": \"sast_fp_detection/v1\",
\"environment\": \"web\",
\"allow_agent_to_request_user\": false
}" \
http://host.docker.internal:3000/api/v4/ai/duo_workflows/workflows

gitlab-org/modelops/ai-model-validation-and-research/ai-evaluation/prompt-library#817

Edited by Nate Rosandich

Merge request reports

Loading