Skip to content

Performance Regression in QEMU (amd64 Emulating LoongArch64) from 8.0.4 to 9.0.2

Host environment

  • Operating system: Arch Linux
  • OS/kernel version: Linux lgarch 6.10.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 04 Aug 2024 05:11:32 +0000 x86_64 GNU/Linux
  • Architecture: x86
  • QEMU flavor: qemu-loongarch64
  • QEMU version: 9.0.2
  • QEMU command line:
    qemu-loongarch64 $CHROOT/usr/lib/p7zip/7z b

Emulated/Virtualized environment

  • Operating system: Arch Linux
  • OS/kernel version: N/A
  • Architecture: loongarch64

Description of problem

Previous Performance: In May 2023, we were using QEMU 8.0.4 for qemu-user emulation, and the performance was satisfactory. The setup did not include LSX (Loongson SIMD Extensions) support in either the system or QEMU. Performance results are shown in Figure A.

Current Performance: Recently, we upgraded to QEMU 9.0.2. Both the system and QEMU now support vectorized instruction sets. However, we observed a performance decline to less than 60% of the previous benchmarks.

We found that the slowdown is not caused by LSX. Disabling LSX compilation in the new version results in even worse performance. However, there are indeed significant differences between the two systems in terms of LSX support.

Steps to Reproduce:

  1. Use QEMU 8.0.4 on an x86 machine to run a specific workload
  2. Upgrade to QEMU 9.0.2
  3. Run the same workload and compare the performance results

Additional information

We use systemd-nspawn and qemu-binfmt for containerized operations. You can get a clean chroot from lcpu release here

Figure A, performance in May 2023 Figure A

Figure B, performance nowadays Figure B

Edited by leavelet
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information