Skip to content

Secondary CPUs hang after savevm/loadvm on ppc64

Host environment

  • Operating system: openSUSE Tumbleweed
  • OS/kernel version: 6.15.8-1-default
  • Architecture: x86_64, but also ppc64le
  • QEMU flavor: qemu-system-ppc64
  • QEMU version: git 5836af07
  • QEMU command line:
    qemu-system-ppc64 -only-migratable -m 4096 -machine usb=off -cpu power8 -smp 4 -drive file=opensuse-Tumbleweed-ppc64le-20250810-textmode@ppc64le.qcow2,if=virtio -serial stdio

Emulated/Virtualized environment

  • Operating system: openSUSE Tumbleweed
  • OS/kernel version: 6.15.8-1-default
  • Architecture: ppc64le

Description of problem

Once the system has booted, run "savevm running", then "loadvm running" in the monitor. The guest is now stuck and does not accept input. After some time, the kernel reports stalls.

This happens on both KVM and TCG.

Cause

Due to fb802acd ("ppc/spapr: Fix RTAS stopped state"), secondary CPUs are quiesced on reset (env->quiesced = true), which needs to be overwritten by loadvm, otherwise they appear suddenly stuck to the guest.

env.quiesced is not part of vmstate_ppc_cpu, so for seconday CPUs it remains true. Adding it to the VMState fixes it:


diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index d72e5ecb94..78bc1a98ff 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -671,7 +671,7 @@ static const VMStateDescription vmstate_bhrb = {
 
 const VMStateDescription vmstate_ppc_cpu = {
     .name = "cpu",
-    .version_id = 5,
+    .version_id = 6,
     .minimum_version_id = 5,
     .pre_save = cpu_pre_save,
     .post_load = cpu_post_load,
@@ -698,6 +698,9 @@ const VMStateDescription vmstate_ppc_cpu = {
         /* Backward compatible internal state */
         VMSTATE_UINTTL(env.hflags_compat_nmsr, PowerPCCPU),
 
+        /* "RTAS stopped" state, independent of internal halted state */
+        VMSTATE_BOOL_V(env.quiesced, PowerPCCPU, 6),
+
         VMSTATE_END_OF_LIST()
     },
     .subsections = (const VMStateDescription * const []) {

I did not submit this because it isn't backwards-compatible: Migration to older QEMU will fail and loading snapshots from older QEMU will still result in stuck CPUs.

FWICT commits 96746f7a and fb802acd did not take backwards compatibility in regards to migration into account. CCing @npiggin as author.

FTR for testing: For working snapshots on POWER8, https://marc.info/?l=qemu-devel&m=175517133831488&w=2 is needed as well.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information