Investigate slow Windows unit test runs
Timings
Host | Target OS | Test | Time |
---|---|---|---|
Linux (CI Docker) | Linux | TestBuildWithDebugTrace/bash |
0.31s |
MacOS | MacOS | TestBuildWithDebugTrace/bash |
0.39s |
MacOS (Vagrant machine) | Windows | TestBuildWithDebugTrace/powershell |
2.78s |
Linux (CI VM) | Windows | TestBuildWithDebugTrace/powershell |
28.04s |
Findings
-
It takes over 6 minutes to start running the first Go test (from spinning up VM, downloading cache/artifacts, installing Go packages, etc.)
-
It takes 10x as long to run a test on the Shared Windows Runner (2 vCPUs/7.60 GB RAM) than it takes on a local Vagrant VM running on a MBP 16" (1 vCPU/2 GB RAM).
-
We are currently running 8 jobs in parallel, and can only increase that to 10, due to a limit in Shared Runners.
- In any case, raising the parallelism to 10 doesn't help as the longest job still takes 00:28:23
-
Almost all of the time is spent on
syscall
s to Windows.go test -cpuprofile cpu.prof -v --count=1 -run TestBuildWithDebugTrace ./executors/shell/...
-
Based on the above, it seems like most of the time might be being spent inside the Powershell script, so I added timestamps before the execution of each command in the executor script:
From there, we can see that running an external command like
git lfs
from the Powershell script is surprisingly slow (800ms). If I run the same command from the Powershell prompt in the Vagrant machine it takes 46ms:PS C:\GitLab-Runner> Measure-Command { git lfs version | Out-Default } git-lfs/2.10.0 (GitHub; windows amd64; go 1.12.7; git a526ba6b) ... TotalMilliseconds : 46.7342
-
Switching stages seems to take ~5 seconds (vs ~250ms on local Vagrant VM): In a unit test that takes a total of 52 seconds, switching job stages consumes 24 seconds.
-
There's a fixed cost of starting a Powershell session which is much higher than bash. In my Vagrant VM, running
Measure-Command { powershell -command echo 1 | Out-Default }
reports 219ms vs 12ms fortime bash -c echo 1
.
Questions
- Why does Powershell execution/unit test execution seem to be an order of magnitude slowed on the Shared Runner than on the Vagrant VM?
Unexplored ideas
- Could be useful to spin up a Windows Shared Runner VM just for running a few tests on the command line, to see if the perf numbers match.