Skip to content

Partially observed requests cause scanner to hang waiting for page load

When the crawler is navigating to a page, it waits for the page to be stable; and part of stability is waiting for any pending background requests to complete. Requests that the crawler knows are finished (because a Network.loadingFinished event was received) but for which we never observed the response (i.e. a Network.responseReceived event was never received) will remain "pending" forever, forcing the page load to wait for the entire timeout duration before continuing.

It is unclear why we would receive Network.requestWillBeSent and Network.loadingFinished events for a request but not a Network.responseReceived or Network.loadingFailed event. This could be due to another scanner bug, or a behavior of the DevTools protocol under particular circumstances that we are not familiar with. Although we don't know how it happens, we have observed this behavior is a recent customer report (internal link).

Proposal

The scanner should consider any request for which Network.loadingFinished has been received as "not pending" for purposes of page stability, regardless of what other events have or have not been received for that request.

Implementation

  1. In browser.MessageContainer:
    1. Add a finished field
    2. Add a Finish method which sets finished = true
    3. In the PendingNecessaryRequestID method, in the first guard clause, add m.finished to the list of conditions
  2. In browser.Container:
    1. Add a Finish(requestID string) method. Follow the pattern of other request state-changing methods (such as AddNetworkResponseReceivedEvent) to call Finish on the MessageContainer.
  3. In service.HTTPMessageService.Finalize, after the first guard clause, use a defer to ensure that the request is marked as finished regardless of whether finalization succeeds or fails: defer func() { container.Finalize(requestID) }()

Testing

Testing these kinds of complex interactions with an actual browser would be preferable. However, since we don't know under what circumstances this problem occurs, the only way to reproduce it would be to manipulate the event stream from CDP before the scanner sees it; and we currently don't have any infrastructure to do that.

So instead we can add tests to http_message_service_test.go and message_container_test.go, using the mock.GCD functionality already in use in similar tests.

Edited by 🤖 GitLab Bot 🤖