Duo VR - Manual Assessment of Injection CWEs
We need to conduct a manual review of our Vulnerability Resolution (VR) capabilities for currently unsupported Common Weakness Enumeration (CWE) categories (with a focus on injection vulnerabilities). This assessment will help us expand our supported CWE list.
Objectives
- Evaluate our current VR performance in fixing injection-related CWEs.
- Identify CWEs that can be immediately added to our supported list.
- Determine areas for improvement in our VR process for injection vulnerabilities.
- Develop a detailed implementation plan to enhance our injection vulnerability support.
Scope
- Focus primarily on GLAS (GitLab Advanced SAST) findings
- Include a few Semgrep findings for a comprehensive assessment
- Focus on injection-related CWEs, not currently on our supported list
Expected Outcomes
- A list of CWEs that can be immediately added to our supported list without additional effort, hopefully not empty.
- A comprehensive report detailing:
- Current performance for each assessed CWE
- Identified gaps in our VR process
- Recommendations for improvements
- A prioritized implementation plan outlining:
- Specific enhancements needed (prompt, context, evaluation, ...)
- Estimated effort and timeline for each improvement
- Dependencies and potential challenges
Implementation Plan
Assess the following CWE Counts from BQ Data and place results in the Manual Assessment Claude 3.7 google sheet:
CWE | Title | Count | # Assessed | % Assessed | User |
---|---|---|---|---|---|
CWE-22 | Path traversal | 1 | @mbenayoun | ||
CWE-22 | Improper limitation of a pathname to a restricted directory ('Path Traversal') | 53 | 16 | 30 | @mbenayoun |
CWE-78 | Improper neutralization of special elements used in an OS Command ('OS Command Injection') | 7 | 7 | 100 | @mbenayoun |
CWE-79 | Improper neutralization of input during web page generation ('Cross-site Scripting') | 95 | 15 | 16 | @adamcohen |
CWE-88 | Improper neutralization of argument delimiters in a command ('Argument Injection') | 1 | 1 | 100 | @mbenayoun |
CWE-89 | Improper neutralization of special elements used in an SQL command ('SQL Injection') | 7 | 7 | 100 | @mbenayoun |
CWE-918 | Server Side Request Forgery (SSRF) | 1 | 1 | 100 | @mbenayoun |
CWE-918 | Server-Side Request Forgery (SSRF) | 10 | 10 | 100 | @mbenayoun |
CWE-94 | Improper control of generation of code ('Code Injection') | 48 | 11 | 22 | @adamcohen |
CWE-94 | Improper control of generation of code (Code Injection) | 1 | @adamcohen | ||
CWE-95 | Improper neutralization of directives in dynamically evaluated code ('Eval Injection') | 72 | 18 | 25 | @mbenayoun |
Totals | 296 | ||||
144 | @adamcohen | ||||
152 | @mbenayoun |
SQL used for generating the Manual Assessment Claude 3.7 google sheet
:
Click to expand
SELECT
cwe,
title,
vulnerability_id,
code,
answer,
resolve_vulnerability,
resolve_vulnerability_confidence,
do_not_introduce_vulnerability,
do_not_introduce_vulnerability_confidence,
preserve_functionality,
preserve_functionality_confidence,
correct_syntax,
correct_syntax_confidence,
overall_assessment,
overall_assessment,
explanation,
CASE
WHEN cwe in ('CWE-79', 'CWE-94') THEN 'adamcohen'
ELSE 'mbenayoun'
END AS evaluated_by
FROM `dev-ai-research-0e2f8974.resolve_vulnerability_investigation.claude_3_7_inspection`
WHERE scanner_name = "GitLab Advanced SAST"
AND explanation != "False positive"
AND NOT answer LIKE "[Failure]%"
AND cwe IN ('CWE-22','CWE-78','CWE-79','CWE-88','CWE-89','CWE-918','CWE-94','CWE-95')
ORDER BY cwe ASC;
Results of the manual review
CWE-89 - SQL Injection
- 7 results have been reviewed
- 3 of them are perfect/excellent fixes
- 3 of them are effected by the "single blank line" bug (Duo VR - Generated MR only adds a new line at e... (#526882)), so they cannot be evaluated
- 1 of them is a detected FP for which an MR has been mistakenly created (see Duo VR - Incorrect handling of detected FP (#529951))
Meaningful reviewed items: 3
Accuracy: 100%
CWE-78 - Command Injection
- 7 results have been reviewed
- 4 of them are perfect fix
- 1 of them is very good
- 2 of them are mitigating the risk in a limited way
Meaningful reviewed items: 7
Accuracy: 82%
CWE-918 - SSRF
- 11 results have been reviewed
- 1 of them is making the vulnerable code even worse
- 2 of them don't address the vulnerability
- 1 of them fixes another CMDi vulnerability but not the SSRF
- 7 of them are insufficient partial remediations
Meaningful reviewed items: 11
Accuracy: 34%
CWE-95 - Eval Injection
- 18 results have been reviewed
- 3 of them have been fixed correctly, but it could have been fixed in a slightly safer way
- 1 of them was an FP, but the code has been improved to be safer
- 12 of them don't address the vulnerability
- 2 of them are affected by the "single blank line" bug (Duo VR - Generated MR only adds a new line at e... (#526882)), so they cannot be evaluated
Eval Injection is a critical vulnerability.
However, most of the results under CWE-95
are related to the low-severity Improper neutralization of directives in dynamically evaluated code
(like https://staging.gitlab.com/ai-evaluation/etv/eyeballvul-sequelize-0057668/-/security/vulnerabilities/7579612), which is a very different vulnerability and very difficult to fix in an automated way.
We should consider changing the corresponding rules. I think that Improper neutralization of directives in dynamically evaluated code
should not be marked as CWE-95
.
Duo VR cannot support CWE-95 until Improper neutralization of directives in dynamically evaluated code
are polluting this category.
Meaningful reviewed items: 16
Accuracy: 31%
CWE-88 - Parameter Pollution (Argument Injection)
- 1 result has been reviewed
- it's a good fix
I'm optimistic about the accuracy of this category, but we have only a single result in the dataset, we don't have enough data to make a data-driven decision.
Meaningful reviewed items: 1
Accuracy: 100%
CWE-22 - Path Traversal
- 16 results have been reviewed
- 12 of them are either partial fix or unrelated to the path traversal vulnerability
- 4 of them are affected by the "single blank line" bug (Duo VR - Generated MR only adds a new line at e... (#526882)), so they cannot be evaluated
Meaningful reviewed items: 12
Accuracy: 31%
CWE-79 - XSS
- 15 results have been reviewed
- 2 of them are perfect fix
- 13 of them don't address the vulnerability or break the existing functionality
Notice that some of the findings have been found in the internal of Django.
As the internals of Django are defining the safe abstractions, the implementation is inherently using unsafe functions. The LLM is then trying to solve an unsolvable problem and produces garbage.
Meaningful reviewed items: 15
Accuracy: 20%
CWE-94 - Code Injection
- 11 results have been reviewed
- 3 of them are very good fixes
- 4 of them are good fixes, but possibly breaking the functionality
- 4 of them probably break the functionality
Notice that some of the findings are SSTI.
The vulnerability description of Code Injection through eval
and SSTI are quite different,
so that's unfortunate that the vulnerability description is not given to the LLM Duo VR - Missing vulnerability_description input (#526865 - closed).
Notice that we have a lot of FP results for improper use of setTimeout
.
A bug has been reported here: FP in javascript-lang-codei-taint: setTimeout(f... (#531041 - closed)
Meaningful reviewed items: 11
Accuracy: 63%