Investigate ways to make better use of parallelism in our CI unit test execution
Problem statement
In recent months, our pipeline has once more climbed from 40 minutes to over 2x that (83 minutes in the most recent pipeline as of writing). This cripples the team's productivity, especially when we have to deal with Russian doll MRs and merge trains.
Why is this happening?
Currently, our unit test
job is being split in 8 parallel jobs (10 for the Windows jobs). We first run a tests definitions
jobs which dumps all unit tests that will be run into a testdefinitions.txt
file, and each of the jobs will take the respective chunk of the unit tests listed in the file. The problem with that approach is that there are tests that take a lot more time to execute than others. Notably, the unit test 3/8
job contains most of the Docker integration tests, with some tests taking more than 70 seconds to finish. The windows 1809 tests 3/10
job takes 43(!) minutes to run (by the way, most of the fault lies with Powershell tests, which take 5x-10x more than the equivalent cmd
test).
How can we improve the situation?
Ideally, we'd figure out a smarter split of the tests suite, so that we devote more parallel resources to the tests that take longer to run, and we run the shorter tests on 1 or 2 jobs. (too bad we can't have the parallel jobs pull test names from a queue of remaining tests to run)
Some improvement ideas:
-
Segregate integration tests into a separate job. E.g. have
integration test x/8
andunit test x/2
jobs. We'd then update downstream jobs to depend on bothintegration test
andunit test
, rather than jobsunit test
. One way to achieve this is to have a standard nomenclature for the integration tests so that thego_test_with_coverage_report
script can treat those separately and devote more jobs to them. - Figure out the issue with Powershell tests. Is it due to a problem with the base image, in which the Powershell assemblies are not NGen'd?
Additional benefits
Having a separate integration tests job would allow us to have separate code coverage artifacts that are concern integration tests. That could be helpful in performing reviews to high-risk MRs like #27559 (closed).
cc @gitlab-com/runner-group for ideas/discussion regarding what your thoughts are regarding the growth of the pipeline duration and the suggested paths.