Duo VR - Manual Assessment of Injection CWEs

We need to conduct a manual review of our Vulnerability Resolution (VR) capabilities for currently unsupported Common Weakness Enumeration (CWE) categories (with a focus on injection vulnerabilities). This assessment will help us expand our supported CWE list.

Objectives

Evaluate our current VR performance in fixing injection-related CWEs.
Identify CWEs that can be immediately added to our supported list.
Determine areas for improvement in our VR process for injection vulnerabilities.
Develop a detailed implementation plan to enhance our injection vulnerability support.

Scope

Focus primarily on GLAS (GitLab Advanced SAST) findings
Include a few Semgrep findings for a comprehensive assessment
Focus on injection-related CWEs, not currently on our supported list

Expected Outcomes

A list of CWEs that can be immediately added to our supported list without additional effort, hopefully not empty.
A comprehensive report detailing:
- Current performance for each assessed CWE
- Identified gaps in our VR process
- Recommendations for improvements
A prioritized implementation plan outlining:
- Specific enhancements needed (prompt, context, evaluation, ...)
- Estimated effort and timeline for each improvement
- Dependencies and potential challenges

Implementation Plan

Assess the following CWE Counts from BQ Data and place results in the Manual Assessment Claude 3.7 google sheet:

CWE	Title	Count	# Assessed	% Assessed	User
CWE-22	Path traversal	1			@mbenayoun
CWE-22	Improper limitation of a pathname to a restricted directory ('Path Traversal')	53	16	30	@mbenayoun
CWE-78	Improper neutralization of special elements used in an OS Command ('OS Command Injection')	7	7	100	@mbenayoun
CWE-79	Improper neutralization of input during web page generation ('Cross-site Scripting')	95	15	16	@adamcohen
CWE-88	Improper neutralization of argument delimiters in a command ('Argument Injection')	1	1	100	@mbenayoun
CWE-89	Improper neutralization of special elements used in an SQL command ('SQL Injection')	7	7	100	@mbenayoun
CWE-918	Server Side Request Forgery (SSRF)	1	1	100	@mbenayoun
CWE-918	Server-Side Request Forgery (SSRF)	10	10	100	@mbenayoun
CWE-94	Improper control of generation of code ('Code Injection')	48	11	22	@adamcohen
CWE-94	Improper control of generation of code (Code Injection)	1			@adamcohen
CWE-95	Improper neutralization of directives in dynamically evaluated code ('Eval Injection')	72	18	25	@mbenayoun

	Totals	296

		144			@adamcohen
		152			@mbenayoun

SQL used for generating the Manual Assessment Claude 3.7 google sheet:

Click to expand

SELECT
  cwe,
  title,
  vulnerability_id,
  code,
  answer,
  resolve_vulnerability,
  resolve_vulnerability_confidence,
  do_not_introduce_vulnerability,
  do_not_introduce_vulnerability_confidence,
  preserve_functionality,
  preserve_functionality_confidence,
  correct_syntax,
  correct_syntax_confidence,
  overall_assessment,
  overall_assessment,
  explanation,
  CASE
    WHEN cwe in ('CWE-79', 'CWE-94') THEN 'adamcohen'
    ELSE 'mbenayoun'
  END AS evaluated_by
FROM `dev-ai-research-0e2f8974.resolve_vulnerability_investigation.claude_3_7_inspection`
WHERE scanner_name = "GitLab Advanced SAST"
  AND explanation != "False positive"
  AND NOT answer LIKE "[Failure]%"
  AND cwe IN ('CWE-22','CWE-78','CWE-79','CWE-88','CWE-89','CWE-918','CWE-94','CWE-95')
ORDER BY cwe ASC;

Results of the manual review

CWE-89 - SQL Injection

7 results have been reviewed
- 3 of them are perfect/excellent fixes
- 3 of them are effected by the "single blank line" bug (Duo VR - Generated MR only adds a new line at e... (#526882)), so they cannot be evaluated
- 1 of them is a detected FP for which an MR has been mistakenly created (see Duo VR - Incorrect handling of detected FP (#529951))

Meaningful reviewed items: 3
Accuracy: 100%

CWE-78 - Command Injection

7 results have been reviewed
- 4 of them are perfect fix
- 1 of them is very good
- 2 of them are mitigating the risk in a limited way

Meaningful reviewed items: 7
Accuracy: 82%

CWE-918 - SSRF

11 results have been reviewed
- 1 of them is making the vulnerable code even worse
- 2 of them don't address the vulnerability
- 1 of them fixes another CMDi vulnerability but not the SSRF
- 7 of them are insufficient partial remediations

Meaningful reviewed items: 11
Accuracy: 34%

CWE-95 - Eval Injection

18 results have been reviewed
- 3 of them have been fixed correctly, but it could have been fixed in a slightly safer way
- 1 of them was an FP, but the code has been improved to be safer
- 12 of them don't address the vulnerability
- 2 of them are affected by the "single blank line" bug (Duo VR - Generated MR only adds a new line at e... (#526882)), so they cannot be evaluated

Eval Injection is a critical vulnerability.
However, most of the results under CWE-95 are related to the low-severity Improper neutralization of directives in dynamically evaluated code (like https://staging.gitlab.com/ai-evaluation/etv/eyeballvul-sequelize-0057668/-/security/vulnerabilities/7579612), which is a very different vulnerability and very difficult to fix in an automated way.
We should consider changing the corresponding rules. I think that Improper neutralization of directives in dynamically evaluated code should not be marked as CWE-95.
Duo VR cannot support CWE-95 until Improper neutralization of directives in dynamically evaluated code are polluting this category.

Meaningful reviewed items: 16
Accuracy: 31%

CWE-88 - Parameter Pollution (Argument Injection)

1 result has been reviewed
- it's a good fix

I'm optimistic about the accuracy of this category, but we have only a single result in the dataset, we don't have enough data to make a data-driven decision.

Meaningful reviewed items: 1
Accuracy: 100%

CWE-22 - Path Traversal

16 results have been reviewed
- 12 of them are either partial fix or unrelated to the path traversal vulnerability
- 4 of them are affected by the "single blank line" bug (Duo VR - Generated MR only adds a new line at e... (#526882)), so they cannot be evaluated

Meaningful reviewed items: 12
Accuracy: 31%

CWE-79 - XSS

15 results have been reviewed
- 2 of them are perfect fix
- 13 of them don't address the vulnerability or break the existing functionality

Notice that some of the findings have been found in the internal of Django.
As the internals of Django are defining the safe abstractions, the implementation is inherently using unsafe functions. The LLM is then trying to solve an unsolvable problem and produces garbage.

Meaningful reviewed items: 15
Accuracy: 20%

CWE-94 - Code Injection

11 results have been reviewed
- 3 of them are very good fixes
- 4 of them are good fixes, but possibly breaking the functionality
- 4 of them probably break the functionality

Notice that some of the findings are SSTI.
The vulnerability description of Code Injection through eval and SSTI are quite different, so that's unfortunate that the vulnerability description is not given to the LLM Duo VR - Missing vulnerability_description input (#526865 - closed).

Notice that we have a lot of FP results for improper use of setTimeout.
A bug has been reported here: FP in javascript-lang-codei-taint: setTimeout(f... (#531041 - closed)

Meaningful reviewed items: 11
Accuracy: 63%

Edited Apr 11, 2025 by Adam Cohen