DAST may deadlock in rare situations
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
In rare situations (see below) DAST may deadlock in its interactions with the Chromium DevTools web socket. When this happens, the scan will hang until a timeout is reached that breaks the deadlock. Depending on which timeout is reached, the scan may continue with degraded functionality (e.g. a particular navigation path is not continued when it should have been), or the scan may fail (e.g. if the timeout that is reached is a page load during the auth process).
If you think you are encountering this issue, see Diagnosis.
Problem
Analysis
DAST records all requests/responses made as part of the scan both to find new navigation paths and to scan them for vulnerabilities. It does that in part by intercepting the Fetch.requestPaused event from DevTools (which is sent both as the request is being sent out and as the response is being received) and issuing a Fetch.getResponseBody request. However, in some cases it is possible for the request to complete without the body being captured. This should be a very rare occurrence, but the presence of other bugs may cause it to happen more often (as was the case with DAST degrades when communication with Chromium ... (#478482 - closed)). When a Network.loadingFinished event occurs for a request and DAST doesn't have the response body, it is fetched with Network.getResponseBody.
We have observed that Network.getResponseBody does not immediately receive a reply from Chromium while there is an outstanding Fetch.requestPaused event. This is possibly because the Fetch domain does not produce fire-and-forget events the way most DevTools domains do. Fetch.requestPaused requires the client which enabled it to tell Chromium how to continue by sending a reply with one of continueRequest, failRequest or fulfillRequest. Until this reply is received, the request is paused. This pause creates a block for the Network.getResponseBody query (even though the query is for a different request). We are unsure if this block is a lock on an internal store of request/responses, or a lock on the WebSocket itself, or something else.
Because DAST handles DevTools events serially, and because it make queries to DevTools synchronously, this block can cause a deadlock:
- DevTools sends a
Network.loadingFinishedevent for a request for which DAST doesn't have the response body - DevTools sends a
Fetch.requestPausedevent for a new request - DAST processes the
Network.loadingFinishedevent by sending a synchronous query forNetwork.getResponseBody
This is the deadlock:
- Chromium is waiting for a reply to
Fetch.requestPaused, which DAST can't send because it is still processingNetwork.loadingFinished. - DAST is waiting for a reply to
Network.getResponseBody, which Chromium can't send because it is waiting for a reply toFetch.requestPaused.
How do we know the lock is on the Chromium side and not on the DAST side?
We know from logs that the Network.getResponseBody query is sent over the web socket immediately.
When the request from Chromium that is participating in the deadlock is a page navigation, and that navigation hits its timeout, DAST cancels the navigation by sending a Page.stopLoading message to DevTools. The reply to that message is received immediately, then the reply to Network.getResponseBody is received. This sequence shows that the reply to Network.getResponseBody was not sent by Chromium until after we sent Page.stopLoading, terminating the outstanding network requests.
Diagnosis
To determine if you are encountering this deadlock, DevTools logging must be enabled; it is also helpful to enable debug logging for the BROWS module (DAST_LOG_FILE_CONFIG: "BROWS:debug"). The telltale sign of the deadlock is a Network.getResponseBody message being sent to Chromium, immediately followed by an extended period (several seconds at least) of no DevTools activity. This period of inactivity will also typically be accompanied by uninterrupted messages that DAST is "waiting for" something to happen (what specifically it is waiting for depends on what caused the request that Network.getResponseBody is deadlocking with):
2024-08-08T00:07:51.772 TRC CHROM event received {"method":"Network.loadingFinished","params":{"requestId":"75.17","timestamp":71277.011315,"encodedDataLength":555}} task="general" method="Network.loadingFinished"
2024-08-08T00:07:51.772 TRC CHROM request sent {"id":264,"method":"Network.getResponseBody","params":{"requestId":"75.17"}} task="general" id="264" method="Network.getResponseBody"
2024-08-08T00:07:51.809 DBG BROWS page is transitioning, navigation load will wait for page transition to complete
2024-08-08T00:07:51.809 DBG BROWS waiting for page transition to complete timeout_in="14.999s"
2024-08-08T00:07:51.961 DBG BROWS waiting for page transition to complete timeout_in="14.848s"
...
2024-08-08T00:07:56.762 DBG BROWS waiting for page transition to complete timeout_in="10.047s"
Proposal
There are several possible ways to prevent the deadlock:
- Create a separate processing queue for DevTools messages where "high-priority" messages are processed with a dedicated thread.
- Instead of finalizing requests synchronously in response to
Network.loadingFinished, create arequestsToFinalizequeue that is processed in a separate thread. - Don't treat
Network.getResponseBodyas a synchronous query. Instead send the query and then return from theNetwork.loadingFinishedevent handler. Then finish finalizing the request when the reply toNetwork.getResponseBodyis received.
1 and 2 are mechanically equivalent, just at different points in the stack. All three seek to prevent making a synchronous query to Network.getResponseBody on the same thread that is handling DevTools events.