s390x TCG migration failure

Host environment

  • Operating system: Fedora 37

  • OS/kernel version: Linux 6.3.0-20230308.rc1.git1.349f8550f08e.300.fc37.s390x

  • Architecture: s390x

  • QEMU flavor: qemu-system-s390x

  • QEMU version: v8.0.0-rc1

  • QEMU command line:

    Migration from:

    /home/user/qemu-build-0-wt/qemu-system-s390x -nodefaults -nographic -machine s390-ccw-virtio,accel=tcg -chardev stdio,id=con0 -device sclpconsole,chardev=con0 -kernel s390x/migration-skey.elf -smp 1 -append --sequential -initrd /tmp/tmp.HoGzEivrOK -chardev socket,id=mon1,path=/tmp/mig-helper-qmp1.nUNgNPK1sd,server=on,wait=off -mon chardev=mon1,mode=control

    to

    /home/user/qemu-build-0-wt/qemu-system-s390x -nodefaults -nographic -machine s390-ccw-virtio,accel=tcg -chardev stdio,id=con0 -device sclpconsole,chardev=con0 -kernel s390x/migration-skey.elf -smp 1 -append --sequential -initrd /tmp/tmp.HoGzEivrOK -chardev socket,id=mon2,path=/tmp/mig-helper-qmp2.SW9g9dAvSp,server=on,wait=off -mon chardev=mon2,mode=control -incoming unix:/tmp/mig-helper-socket.RdiNr6ODZ4

Emulated/Virtualized environment

  • Operating system: kvm-unit-test (s390x migration tests)
  • OS/kernel version: eab2fcf3355ec8f087f75c6a167756e9e3ad39ed
  • Architecture: s390x

Description of problem

We're seeing failures running s390x migration kvm-unit-tests tests with TCG.

Some initial findings:

What seems to be happening is that after migration a control block header accessed by the test code is all zeros which causes an unexpected exception.

I did a bisection which points to c8df4a7a ("migration: Split save_live_pending() into state_pending_*") as the culprit. The migration issue persists after applying the fix e2647050 ("migration: I messed state_pending_exact/estimate") on top of c8df4a7a.

Applying

diff --git a/migration/ram.c b/migration/ram.c
index 56ff9cd29d..2dc546cf28 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3437,7 +3437,7 @@ static void ram_state_pending_exact(void *opaque, uint64_t max_size,
 
     uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
 
-    if (!migration_in_postcopy()) {
+    if (!migration_in_postcopy() && remaining_size < max_size) {
         qemu_mutex_lock_iothread();
         WITH_RCU_READ_LOCK_GUARD() {
             migration_bitmap_sync_precopy(rs);

on top fixes or hides the issue. (The comparison was removed by c8df4a7a.)

I arrived at this by experimentation, I haven't looked into why this makes a difference.

Steps to reproduce

  1. Run ACCEL=tcg ./run_tests.sh migration-skey-sequential with current QEMU master
  2. Repeat until the test fails (doesn't happen every time, but still easy to reproduce)