Speed: Process work items in parallel (WebRunnerMachine)

Problem

The WebRunnerCheckStrategy generates IWebRunnerWorkItems which are currently processed in series. To improve speed and allow taking advantage of a larger testing machine with multiple CPUs, the work items should be processed in parallel up to some reasonable max parallelism based on CPU/Core counts.

Proposal

Use the .NET Dataflow objects to process work items generated by WebRunnerCheckStrategy in parallel. The existing 344864-threaded-runner branch will be used as a reference only. The refactoring of WebRunnerCheckStrategy should provide a much better foundation than existed in the prior working branch for threading.

NOTE: Please keep the 344864-threaded-runner until @mikeeddington returns. Some hotpath methods were optimized in this branch.

One issue found in the existing branch was the minimum workers count on ThreadPool should not match the vCPU, it should be higher by at least 1. On single vCPU runners, such as the GitLab shared runners, increasing the worker thread count improved performance.

To determine the correct levels for ThreadPool MinThreads WorkerThreads and also MaxDegreeOfParallelism for the TransformBlock. It's recommended to build a 4 vCPU runner in GPC to use with a custom label, and also use the shared runners with a single vCPU. Do not expose any new variables for configuring parallelism at this time.

At start of job run each operation record to generate work items
- #353293 (closed)
Work items added to TransformBlock for processing
Automatically check CPU/core count and update max parallelization and task limits accordingly
- Override task minimums to prevent major slow down on 1 vCPU runners
Work Items
- Active checks
- Mutational checks
  - Parameter + Check
Documentation update for how to increase testing speed using a larger instance

Implementation

David's transition notes as of 7/17:

Configuration via environment variable is implemented and tested
1. During testing it was found that setting the value to 0 causes worker-entry to not serialize the value in the create session request (presumably because it is a default value, but I was not able to track down specifically where this determination was made). This effectively makes 0 equal to not setting it at all, which causes the DOP setting to be calculated based on CPUs. That seemed like a reasonable meaning for 0 - particularly because actually setting MaxDop to 0 in the Dataflow blocks disables them - so it is documented and tested that way. Although it cannot be set from worker-entry, if a 0 does show up in the runner options (because .NET integration tests do not strip the 0 value for example), it is explicitly recognized as being equivalent to null. See this code comment.
2. Setting the value to -1 means "Unbounded" to Dataflow. While initially it seemed like it might be a good idea to support this value, integration testing with unbounded DOP showed that it can easily overwhelm the target. Since it seemed likely to cause more problems than it solves, I decided not to support it. See this code comment and this one.
3. Initially I named this MAX_CONCURRENT_REQUESTS, trying to tie it to something the user would understand. However, while I think it is accurate for now (though hard to prove), I realized that might be a stronger commitment than we are willing to make. So I changed it to MAX_CONCURRENCY as recommended in the task list.
Testing
1. A few tests are not passing:
  1. tests_int_config_fuzzing-headers-quick and tests_int_config_fuzzing-quick need to have expectations updated. StatusCode-based vulnerabilities coming from our Flask target are now unreliable, since they depend on the order of the incoming requests. The solution (for now) is to ignore status code vulnerabilities in our integration tests.
  2. ShouldFindVulnerabilityRemoteFileInclusionInJsonBodyWithEmbeddedXml integration test is failing intermittently because XmlExternalEntityCheck is stateful and not safe for concurrent execution¹.
  3. There is a Debug.Assert in BlindInjectionAssertion that is causing multiple integration tests to fail because BlindInjectionAssertion is stateful and not safe for concurrent execution¹. The Debug.Assert is currently commented out to prevent it from masking other issues.
2. New integration tests specifically for concurrent execution have been added
  1. These tests use a RunnerOptions property which is not intended to be exposed to consumers which waits to run checks until after all recording is done. We could make this the default behavior instead, which would simplify the code slightly; but since we previously talked about keeping the current behavior, I made it an option for now. See this code comment.
  2. These tests are sending more requests than necessary, and could be faster if the requests could be reduced. See this code comment.
3. Worker-entry tests for configuration of concurrent execution have been added
  1. Testing that parallel execution is actually happening at this level would be quite difficult, as well as redundant with the integration tests. So these tests do as little as possible and focus on confirming that the setting is passed correctly from worker-entry to the scanner.

The spike branch created a MutationContext which could hold the state from these checks in order to make them safe for concurrent execution. However that solution caused large changes throughout the entire architecture. Instead we discussed creating concurrency-safe caches inside the checks, keyed to each work item, to store the needed state. ↩ ↩²

Edited Aug 17, 2022 by Michael Eddington