Investigate ways to make better use of parallelism in our CI unit test execution

Problem statement

In recent months, our pipeline has once more climbed from 40 minutes to over 2x that (83 minutes in the most recent pipeline as of writing). This cripples the team's productivity, especially when we have to deal with Russian doll MRs and merge trains.

Why is this happening?

Currently, our unit test job is being split in 8 parallel jobs (10 for the Windows jobs). We first run a tests definitions jobs which dumps all unit tests that will be run into a testdefinitions.txt file, and each of the jobs will take the respective chunk of the unit tests listed in the file. The problem with that approach is that there are tests that take a lot more time to execute than others. Notably, the unit test 3/8 job contains most of the Docker integration tests, with some tests taking more than 70 seconds to finish. The windows 1809 tests 3/10 job takes 43(!) minutes to run (by the way, most of the fault lies with Powershell tests, which take 5x-10x more than the equivalent cmd test).

How can we improve the situation?

Ideally, we'd figure out a smarter split of the tests suite, so that we devote more parallel resources to the tests that take longer to run, and we run the shorter tests on 1 or 2 jobs. (too bad we can't have the parallel jobs pull test names from a queue of remaining tests to run)

Some improvement ideas:

Segregate integration tests into a separate job. E.g. have integration test x/8 and unit test x/2 jobs. We'd then update downstream jobs to depend on both integration test and unit test, rather than jobs unit test. One way to achieve this is to have a standard nomenclature for the integration tests so that the go_test_with_coverage_report script can treat those separately and devote more jobs to them.
Figure out the issue with Powershell tests. Is it due to a problem with the base image, in which the Powershell assemblies are not NGen'd?

Additional benefits

Having a separate integration tests job would allow us to have separate code coverage artifacts that are concern integration tests. That could be helpful in performing reviews to high-risk MRs like #27559 (closed).

cc @gitlab-com/runner-group for ideas/discussion regarding what your thoughts are regarding the growth of the pipeline duration and the suggested paths.

Edited Feb 24, 2021 by Pedro Pombeiro (OOO from July 16th till Aug 7th)