x86 TCG acceleration running on s390x with -smp > host cpus slowed down by x10
Host environment
- Operating system: Ubuntu 23.04
- OS/kernel version: 5.15.0-60-generic
- Architecture: s390x
- QEMU flavor: qemu-system-x86_64 (in TCG mode)
- QEMU version: latest master v7.2.0-1688-ge1f9f73b
- QEMU command line: qemu-system-x86_64 -machine 'q35,accel=tcg' -no-user-config -nodefaults -m '256' -smp '2,sockets=2,cores=1,threads=1' -display 'none' -serial 'stdio' -chardev 'pty,id=charserial1' -device 'isa-serial,chardev=charserial1,id=serial1 -drive 'file=/usr/share/OVMF/OVMF_CODE.ms.fd,if=pflash,format=raw,unit=0,readonly=on -drive 'file=/tmp/tmpzqleta0y,if=pflash,format=raw,unit=1,readonly=off -global 'ICH9-LPC.disable_s3=1'
Emulated/Virtualized environment
- Operating system: none, just boot into OVMF
- OS/kernel version: n/a
- Architecture: x86
Description of problem
This boots up a trivial guest using OVMF, when the conditions below are given it runs ~10x slower.
I have found this breaking our tests of qemu 7.2 (which due to Debian adding the offending change as backport is affected) by runnig an order of magnitude slower.
I was tracing it down (insert a long strange trip here) and found that it occurs:
- only with patch dab30fbe "acpi: cpuhp: fix guest-visible maximum access size to the legacy reg block" applied
- latest master is still affetced
- only with s390x running emulation of x86
- emulating x86 on ppc64 didn't show the same behavior
- only with -smp > host cpus
- smp 2 with 1 host cpu => slow
- smp 4 with 2 host cpu => slow
- any case where host cpu >= smp => fast
On average good cases are on a 2964 s390x machine taking ~5-6 seconds for the good case. The bad case is close to 60s which is the timeout of the automated tests.
We all know -smp shouldn't be >host-cpus, and I totally admit that this is the definition of an edge case. But I do not know what else might be affected and this just happened to be what the test does by default - and a slowdown by x10 seems too much even for edge cases to be just ignored. And while we could just bump up the timeout (and probably will as an interim workaround) I wanted to file it here for your awareness.
Steps to reproduce
You can recreate the same by using the commandline above and timing things on your own.
Or you can use the autopkgtest of edk2 in Ubuntu which have shown this first.
Additional information
Only signed OVMF cases are affected, while aavmf and other OVMF are more or less on the same speed.
1 CPU / 1GB Memory
7.0 7.2
6.54s 58.32s test_ovmf_ms
6.72s 56.96s test_ovmf_4m_ms
7.54s 55.47s test_ovmf_4m_secboot
7.56s 49.88s test_ovmf_secboot
7.01s 39.79s test_ovmf32_4m_secboot
7.38s 7.43s test_aavmf32
7.27s 7.30s test_aavmf
7.26s 7.26s test_aavmf_snakeoil
5.83s 5.95s test_ovmf_4m
5.61s 5.81s test_ovmf_q35
5.51s 5.64s test_ovmf_pc
5.26s 5.42s test_ovmf_snakeoil
Highlighting @cborntra since it is somewhat s390x related and @mjt0k as the patch is applied as backport in Debian. I didn't find the handle of Laszlo (Author) to highlight him as well.