Incorporate response content and structure into DAST vulnerability tracking
Problem to solve
By incorporating information about the response into vulnerability tracking, DAST scans could potentially:
- Reduce the number of Duplicate Findings
- Reduce the number of False Negatives
Intended users
Proposal
When a DAST finds a vulnerability, it captures information about the response that caused the vulnerability. This can be incorporated into vulnerability tracking:
- The request header
Content-Type
. Different content types (JavaScript, HTML, etc) imply different pages. - The request header
Content-Length
. Responses that differ significantly in content length imply different pages. - The request body structure (HTML only). Responses that have different content structure imply different pages.
Once all of this information is incorporated into DAST vulnerability tracking, then in theory, the query string, fragment and any trailing forward slash could safely be removed from the URL when tracking vulnerabilities. This would remove duplicate findings, with some protection against introducing false negatives.
Example
For example, the following URLs are likely the same page:
-
https://my.site/house?sort=asc
, and https://my.site/house
If a vulnerability is found, it will appear twice on the Security Dashboard. One of these is a duplicate finding.
A naive solution could be to remove the query string from the vulnerability tracking URL comparison. Both URLs above would be tracked at https://my.site/house
, solving the duplicate finding. However, for the URLs showing different pages:
-
https://my.site/lesson?lesson=piano
, and https://my.site/lesson?lesson=guitar
If a vulnerability is found at each page, it will appear only once on the Security Dashboard, resulting in a false negative.
Implementation
- Incorporate
content-type
into vulnerability tracking-
Update GitLab Rails to add response header content-type
to the DAST fingerprint. -
Handle case when there is no content-type
response header.
-
- Incorporate
content-length
into vulnerability tracking-
Content length should be normalized into buckets of TBD KB. If the content is TBD KB different from other content at the same URL, it can be considered a different page. -
Update GitLab Rails to incorporate normalized content-length
to the DAST fingerprint. -
Handle case when there is no content-length
response header.
-
- Incorporate content structure into vulnerability tracking
-
When a DAST finds a vulnerability, parse the HTML response body. Strip the content back to a just the DOM elements (e.g. html
->body
->div
etc). -
Convert the DOM elements to a tree structure -
Calculate the distance between DOM tree structure and a standard-bearer DOM tree structure (e.g. html
->body
). The distance should be a number, such that similar structures have similar distances when compared against the standard-bearer. treecompare might be a useful tool for this. -
Expose the structure distance in the DAST report location
-
Update the DAST schema to include the structure distance -
Structure distance should be normalized into buckets of TBD. If the distance is TBD different from other content at the same URL, it can be considered a different page. -
Update GitLab Rails to incorporate normalized structure distance to the DAST fingerprint. -
Handle case when reponse body is empty. -
Handle case when response is not HTML. -
Handle case when response body is invalid HTML.
-