GitLab runner for Windows causes high CPU load on KVM host
I have GitLab Runner 10.2.0 running on Windows 2016 in KVM via libvirt on Debian 9. When GitLab runnner is running I see a CPU usage on the host of 30%-40%. The task manager on the Windows Server doesn't show any high CPU usage.
The reason for the high load on the host seems to be a huge number of context switches. dstat reports about 7k context switches. Stopping GitLab Runner makes the context switches drop to around 1.6k and interrupts to around 600. With the Runner running:
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
4 5 91 0 0 0| 0 16k|2553B 3180B| 0 0 |1667 7299
7 5 87 0 0 1| 0 0 |1760B 3265B| 0 0 |1903 7369
5 6 89 0 0 0| 0 44k|5944B 5917B| 0 0 |1803 7299
5 5 89 0 0 0| 0 16k|1662B 2536B| 0 0 |1913 7271
With the Runner stopped:
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
2 1 97 0 0 0| 0 16k|3418B 3846B| 0 0 | 695 1677
1 1 95 3 0 0| 0 16M|4484B 5720B| 0 0 | 696 1691
4 1 95 0 0 1| 0 0 | 122B 720B| 0 0 | 667 1529
In /proc/interrupts I can see that most of these interrupts seem to be timer interrupts:
# cat /proc/interrupts | grep LOC; sleep 1; cat /proc/interrupts | grep LOC;
LOC: 214735335 229250283 229962515 214008402 Local timer interrupts
LOC: 214735837 229250667 229962913 214008715 Local timer interrupts
I found out that GitLab runner is the cause by running powercfg energy monitoring on the Windows Server with powercfg -energy duration 5. It reports the following:
Platform Timer Resolution:Outstanding Timer Request
A program or service has requested a timer resolution smaller than the platform maximum timer resolution.
Requested Period 10000
Requesting Process ID 2548
Requesting Process Path \Device\HarddiskVolume2\GitLab-Runner\gitlab-runner.exe
There already was a similar issue (#976 (closed)) some time ago but no real solution was found. Maybe the information regarding the timing stuff helps to track it down.