x86 TCG acceleration running on s390x with -smp > host cpus slowed down by x10

Host environment

  • Operating system: Ubuntu 23.04
  • OS/kernel version: 5.15.0-60-generic
  • Architecture: s390x
  • QEMU flavor: qemu-system-x86_64 (in TCG mode)
  • QEMU version: latest master v7.2.0-1688-ge1f9f73b
  • QEMU command line: qemu-system-x86_64 -machine 'q35,accel=tcg' -no-user-config -nodefaults -m '256' -smp '2,sockets=2,cores=1,threads=1' -display 'none' -serial 'stdio' -chardev 'pty,id=charserial1' -device 'isa-serial,chardev=charserial1,id=serial1 -drive 'file=/usr/share/OVMF/OVMF_CODE.ms.fd,if=pflash,format=raw,unit=0,readonly=on -drive 'file=/tmp/tmpzqleta0y,if=pflash,format=raw,unit=1,readonly=off -global 'ICH9-LPC.disable_s3=1'

Emulated/Virtualized environment

  • Operating system: none, just boot into OVMF
  • OS/kernel version: n/a
  • Architecture: x86

Description of problem

This boots up a trivial guest using OVMF, when the conditions below are given it runs ~10x slower.

I have found this breaking our tests of qemu 7.2 (which due to Debian adding the offending change as backport is affected) by runnig an order of magnitude slower.

I was tracing it down (insert a long strange trip here) and found that it occurs:

  • only with patch dab30fbe "acpi: cpuhp: fix guest-visible maximum access size to the legacy reg block" applied
    • latest master is still affetced
  • only with s390x running emulation of x86
    • emulating x86 on ppc64 didn't show the same behavior
  • only with -smp > host cpus
    • smp 2 with 1 host cpu => slow
    • smp 4 with 2 host cpu => slow
    • any case where host cpu >= smp => fast

On average good cases are on a 2964 s390x machine taking ~5-6 seconds for the good case. The bad case is close to 60s which is the timeout of the automated tests.

We all know -smp shouldn't be >host-cpus, and I totally admit that this is the definition of an edge case. But I do not know what else might be affected and this just happened to be what the test does by default - and a slowdown by x10 seems too much even for edge cases to be just ignored. And while we could just bump up the timeout (and probably will as an interim workaround) I wanted to file it here for your awareness.

Steps to reproduce

You can recreate the same by using the commandline above and timing things on your own.

Or you can use the autopkgtest of edk2 in Ubuntu which have shown this first.

Additional information

Only signed OVMF cases are affected, while aavmf and other OVMF are more or less on the same speed.

1 CPU / 1GB Memory
7.0     7.2
6.54s   58.32s test_ovmf_ms
6.72s   56.96s test_ovmf_4m_ms
7.54s   55.47s test_ovmf_4m_secboot
7.56s   49.88s test_ovmf_secboot
7.01s   39.79s test_ovmf32_4m_secboot
7.38s    7.43s  test_aavmf32
7.27s    7.30s  test_aavmf
7.26s    7.26s  test_aavmf_snakeoil
5.83s    5.95s  test_ovmf_4m
5.61s    5.81s  test_ovmf_q35
5.51s    5.64s  test_ovmf_pc
5.26s    5.42s  test_ovmf_snakeoil

Highlighting @cborntra since it is somewhat s390x related and @mjt0k as the patch is applied as backport in Debian. I didn't find the handle of Laszlo (Author) to highlight him as well.