AjaxCrawler does not start at the specified target URL

Summary

When DAST is used to test javascript-heavy applications, it can be instructed to use an AJAX crawler with the command-line param -j. While investigating https://gitlab.com/gitlab-org/gitlab-ee/issues/10448#note_197871139, I found that the ajax crawler does not crawl the target URL specified in $DAST_WEBSITE / -t. Instead, it starts crawling from the root of the target website. For example, if the target website is http://webgoat.local/WebGoat/attack, it will start crawling from http://webgoat.local/.

This is problematic if the target website has no page at /. The crawler will just get a 404 for / and does not crawl any other pages.

Steps to reproduce

Start a target application that does not have a page at /. For example, WebGoat.
Check out the DAST repo, cd into it and build the image with docker build --pull -t dast .
Start DAST with param -j so that it uses the AJAX crawler and with the log4j debug config, e.g. docker run -v $(pwd)/config/zap-log4j.properties:/root/.ZAP_D/log4j.properties dast /analyze -t http://goat:8080/WebGoat/attack -j.
When inspecting the HTTP requests sent by the AJAX crawler, I found that it never requests the target URL http://goat:8080/WebGoat/attack, but instead http://goat:8080 (the path portion of the target URL is discarded). The relevant HTTP log showing that http://goat:8080 is requested instead of the target URL is:

2019-07-30 15:02:36,492 [ZAP-ProxyThread-20] DEBUG header - >> "GET / HTTP/1.1[\r][\n]"
2019-07-30 15:02:36,492 [ZAP-ProxyThread-20] DEBUG HttpMethodBase - Adding Host request header
2019-07-30 15:02:36,492 [ZAP-ProxyThread-20] DEBUG header - >> "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:59.0) Gecko/20100101 Firefox/59.0[\r][\n]"
2019-07-30 15:02:36,492 [ZAP-ProxyThread-20] DEBUG header - >> "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8[\r][\n]"
2019-07-30 15:02:36,492 [ZAP-ProxyThread-20] DEBUG header - >> "Accept-Language: en-US,en;q=0.5[\r][\n]"
2019-07-30 15:02:36,492 [ZAP-ProxyThread-20] DEBUG header - >> "Connection: keep-alive[\r][\n]"
2019-07-30 15:02:36,492 [ZAP-ProxyThread-20] DEBUG header - >> "Upgrade-Insecure-Requests: 1[\r][\n]"
2019-07-30 15:02:36,492 [ZAP-ProxyThread-20] DEBUG header - >> "Cookie: JSESSIONID=0E1D6A906FD857B4405C12E5218F30CB[\r][\n]"
2019-07-30 15:02:36,492 [ZAP-ProxyThread-20] DEBUG header - >> "Host: goat:8080[\r][\n]"

What is the current bug behavior?

Crawler starts at the root of the target website.

What is the expected correct behavior?

Crawler should start at the URL specified as target website.

Documentation

Update documentation for starting at a specific url, not just a domain.

Triage

This issue occurs because the target URL is truncated in the ZAProxy Docker code: zap-baseline.py#L319.

Like other issues, we should be able to fix this by monkey patching the ajax spider scan and passing in the original target URL.

Giving this a weight of 2.

Edited Oct 31, 2019 by Cameron Swords