Skip to content

DAST crawl tasks are not running in parallel

Problem

DAST seeds the crawler with a crawl task that loads the target URL. The crawler then starts by running crawl tasks in a browser, finding new crawl tasks (e.g. "click on link"), and repeating until there are no more crawl tasks.

The DAST configuration DAST_BROWSER_NUMBER_OF_BROWSERS represents the maximum number of crawl tasks that can run at one time. This defaults to three.

Unfortunately, even with the number of browsers set to a value higher than one, DAST is not running any crawl tasks in parallel. This causes significant performance degredation of DAST scans.

Proposal

Investigate and fix in Browserker why setting NumBrowsers is not causing crawl jobs to be run in parallel.

Reference

https://gitlab.com/gitlab-com/sec-sub-department/section-sec-request-for-help/-/issues/165+

Evidence

Running grep "DBG BPOOL" test_browserker_scan.log on the log generated from a simplified version of test_browserker_scan() provides the following output. acquired is always followed by returned, indicating these are not running in parallel.

2023-12-14T12:32:44.113 DBG BPOOL acquired browser from the pool id="7826317613731332128"
2023-12-14T12:32:46.596 DBG BPOOL returned browser to the pool id="7826317613731332128"
2023-12-14T12:32:46.598 DBG BPOOL acquired browser from the pool id="1894748505353589243"
2023-12-14T12:32:48.763 DBG BPOOL returned browser to the pool id="1894748505353589243"
2023-12-14T12:32:48.763 DBG BPOOL acquired browser from the pool id="7616396462285414804"
2023-12-14T12:32:50.910 DBG BPOOL returned browser to the pool id="7616396462285414804"
2023-12-14T12:32:50.911 DBG BPOOL acquired browser from the pool id="6921836113081413710"
2023-12-14T12:32:55.368 DBG BPOOL returned browser to the pool id="6921836113081413710"
2023-12-14T12:32:55.370 DBG BPOOL acquired browser from the pool id="4910366700867892899"
2023-12-14T12:32:58.173 DBG BPOOL returned browser to the pool id="4910366700867892899"
2023-12-14T12:32:58.174 DBG BPOOL acquired browser from the pool id="7085433975950022948"
2023-12-14T12:33:02.701 DBG BPOOL returned browser to the pool id="7085433975950022948"
2023-12-14T12:33:02.703 DBG BPOOL acquired browser from the pool id="3124536843791108293"
2023-12-14T12:33:06.335 DBG BPOOL returned browser to the pool id="3124536843791108293"
2023-12-14T12:33:06.337 DBG BPOOL acquired browser from the pool id="4705944725955125302"
2023-12-14T12:33:10.009 DBG BPOOL returned browser to the pool id="4705944725955125302"
2023-12-14T12:33:10.105 DBG BPOOL shutting down browser pool

This has been independently verified using two different customer scans.

Implementation plan

  • Refactor the crawl service to allow more than one job to run at a time
  • Convert ProduceNextPathsToCrawl.uncrawledPaths to accept browserk.Path
Edited by Cameron Swords