Secondary CPUs hang after savevm/loadvm on ppc64
Host environment
- Operating system: openSUSE Tumbleweed
- OS/kernel version: 6.15.8-1-default
- Architecture: x86_64, but also ppc64le
- QEMU flavor: qemu-system-ppc64
- QEMU version: git 5836af07
- QEMU command line:
qemu-system-ppc64 -only-migratable -m 4096 -machine usb=off -cpu power8 -smp 4 -drive file=opensuse-Tumbleweed-ppc64le-20250810-textmode@ppc64le.qcow2,if=virtio -serial stdio
Emulated/Virtualized environment
- Operating system: openSUSE Tumbleweed
- OS/kernel version: 6.15.8-1-default
- Architecture: ppc64le
Description of problem
Once the system has booted, run "savevm running", then "loadvm running" in the monitor. The guest is now stuck and does not accept input. After some time, the kernel reports stalls.
This happens on both KVM and TCG.
Cause
Due to fb802acd ("ppc/spapr: Fix RTAS stopped state"), secondary CPUs
are quiesced on reset (env->quiesced = true), which needs to be overwritten
by loadvm, otherwise they appear suddenly stuck to the guest.
env.quiesced is not part of vmstate_ppc_cpu, so for seconday CPUs it
remains true. Adding it to the VMState fixes it:
diff --git a/target/ppc/machine.c b/target/ppc/machine.c
index d72e5ecb94..78bc1a98ff 100644
--- a/target/ppc/machine.c
+++ b/target/ppc/machine.c
@@ -671,7 +671,7 @@ static const VMStateDescription vmstate_bhrb = {
const VMStateDescription vmstate_ppc_cpu = {
.name = "cpu",
- .version_id = 5,
+ .version_id = 6,
.minimum_version_id = 5,
.pre_save = cpu_pre_save,
.post_load = cpu_post_load,
@@ -698,6 +698,9 @@ const VMStateDescription vmstate_ppc_cpu = {
/* Backward compatible internal state */
VMSTATE_UINTTL(env.hflags_compat_nmsr, PowerPCCPU),
+ /* "RTAS stopped" state, independent of internal halted state */
+ VMSTATE_BOOL_V(env.quiesced, PowerPCCPU, 6),
+
VMSTATE_END_OF_LIST()
},
.subsections = (const VMStateDescription * const []) {
I did not submit this because it isn't backwards-compatible: Migration to older QEMU will fail and loading snapshots from older QEMU will still result in stuck CPUs.
FWICT commits 96746f7a and fb802acd did not take backwards compatibility in regards to migration into account. CCing @npiggin as author.
FTR for testing: For working snapshots on POWER8, https://marc.info/?l=qemu-devel&m=175517133831488&w=2 is needed as well.