Optimize DAST reporting to reduce memory usage
Problem
DAST scans on target applications that make lots of HTTP requests are experiencing a spike in memory. This is causing the Runner to run out of memory on at least one customer scan.
The most likely cause
While it is not guaranteed to be the problem, it is known that store.NavigationResult.LoadAll()
is extremely inefficient, and is the only known contender for such large memory spikes.
LoadAll
loads results of all user actions attempted by DAST during the scan, which could number in the thousands for a long scan. Each NavigationResult
object contains the HTTP requests and responses recorded during the action. It is common for request and response bodies to be many MBs in size. For example, if Chromium loads a 10 MB JavaScript file on every page, the 10 MB response body will be recorded in every navigation result, all of which will be loaded in memory when LoadAll
is called.
Customer
See Memory Usage %
for an example of such a memory usage spike https://gitlab.com/gitlab-com/sec-sub-department/section-sec-request-for-help/-/issues/125#note_1794526803.
Proposal
Convert the offending method to stream results, or to return a specific set of results according to how it is used.
Implementation plan
- Change
printer.UniqueAuditedURLS
to usestore.NavigationResult.IterateHTTPMessages
- Change
services.SecurityReportFormatter
to usestore.NavigationResult.IterateHTTPMessages
- Remove
CrawlGraph.GetNavigationResults
andstore.NavigationResult.LoadAll()
- Add a changelog entry