Replay/record does not work with `rrsnapshot`/`loadvm`

Host environment

  • Operating system: Ubuntu 20.04.6 LTS
  • OS/kernel version: Linux ub20045.4.0-153-generic #170-Ubuntu SMP Fri Jun 16 13:43:31 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Architecture: x86_64
  • QEMU flavor: qemu-system-x86_64
  • QEMU version: 9.1.0
  • QEMU command line:
$ qemu-system-x86_64 \
  -cpu SandyBridge -smp 1 \
  -serial stdio -display none \
  -m 4096 \
  -drive file=./empty.qcow2,id=rr \
  -kernel ./boot/vmlinuz-lts \
  -initrd ./boot/initramfs-lts  .
  -monitor telnet::12345,server,nowait \
  -append "console=ttyS0 root=/dev/ram0 alpine_dev=cdrom:iso9660 modules=loop,squashfs,sd-mod,usb-storage quiet" \
  -icount shift=auto,rrfile=rr,rr=replay,rrsnapshot=init

Emulated/Virtualized environment

  • Operating system: alpine-standard-3.20.3-x86_64.iso
  • OS/kernel version: -
  • Architecture: x86_64

Description of problem

Qemu's record/replay feature does not properly work when using snapshots (like rrsnapshot).

Record/replay without snapshotting works just fine, but when using rrsnapshot=... the replay is stuck at boot. loadvm monitor command also gets qemu stuck.

Record command:

$ qemu-system-x86_64 \
  -cpu SandyBridge -smp 1 \
  -serial stdio -display none \
  -m 4096 \
  -drive file=./empty.qcow2,id=rr \
  -kernel ./boot/vmlinuz-lts \
  -initrd ./boot/initramfs-lts  .
  -monitor telnet::12345,server,nowait \
  -append "console=ttyS0 root=/dev/ram0 alpine_dev=cdrom:iso9660 modules=loop,squashfs,sd-mod,usb-storage quiet" \
  -icount shift=auto,rrfile=rr,rr=record,rrsnapshot=init

Broken replay command, which gets qemu stuck:

$ qemu-system-x86_64 \
  -cpu SandyBridge -smp 1 \
  -serial stdio -display none \
  -m 4096 \
  -drive file=./empty.qcow2,id=rr \
  -kernel ./boot/vmlinuz-lts \
  -initrd ./boot/initramfs-lts  .
  -monitor telnet::12345,server,nowait \
  -append "console=ttyS0 root=/dev/ram0 alpine_dev=cdrom:iso9660 modules=loop,squashfs,sd-mod,usb-storage quiet" \
  -icount shift=auto,rrfile=rr,rr=replay,rrsnapshot=init

qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.tsc-deadline [bit 24]

Record/replay without rrsnapshot/loadvm/etc works as expected.

Steps to reproduce

To reproduce i've used alpine linux kernel as the guest:

wget https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-standard-3.20.3-x86_64.iso
7z x alpine-standard-3.20.3-x86_64.iso

Prerequisites - an empty qcow2 file for snapshots:

qemu-img create -f qcow2 empty.qcow2 1G

Running an alpine linux kernel with rr=record - works just fine, kernel boots, accepts input.

$ qemu-system-x86_64 \
  -cpu SandyBridge -smp 1 \
  -serial stdio -display none \
  -m 4096 \
  -drive file=./empty.qcow2,id=rr \
  -kernel ./boot/vmlinuz-lts \
  -initrd ./boot/initramfs-lts  .
  -monitor telnet::12345,server,nowait \
  -append "console=ttyS0 root=/dev/ram0 alpine_dev=cdrom:iso9660 modules=loop,squashfs,sd-mod,usb-storage quiet" \
  -icount shift=auto,rrfile=rr,rr=record,rrsnapshot=init

qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.tsc-deadline [bit 24]
mount: mounting /dev/ram0 on /sysroot failed: Invalid argument
Mounting root failed. 
initramfs emergency recovery shell launched. Type 'exit' to continue boot
sh: can't access tty; job control turned off
~ # ls -alh
total 32K    
drwx------   18 root     root           0 Oct 21 13:02 .
drwx------   18 root     root           0 Oct 21 13:02 ..
-rw-------    1 root     root           8 Oct 21 13:02 .ash_history
drwxr-xr-x    2 root     root           0 Jun 18 12:44 .modloop
drwxr-xr-x    2 root     root           0 Oct 21 13:02 bin
drwxr-xr-x    9 root     root        2.5K Oct 21 13:02 dev
drwxr-xr-x    4 root     root           0 Oct 21 13:02 etc
-rwxr-xr-x    1 root     root       25.9K Jun 18 12:44 init
drwxr-xr-x    5 root     root           0 Jun 18 12:44 lib
drwxr-xr-x    5 root     root           0 Jun 18 12:44 media
drwxr-xr-x    2 root     root           0 Jun 18 12:44 newroot
dr-xr-xr-x  114 root     root           0 Oct 21 13:02 proc
drwx------    2 root     root           0 Sep  4 12:53 root
drwxr-xr-x    3 root     root           0 Oct 21 13:02 run
drwxr-xr-x    2 root     root           0 Oct 21 13:02 sbin
dr-xr-xr-x   13 root     root           0 Oct 21 13:02 sys
drwxr-xr-x    2 root     root           0 Oct 21 13:02 sysroot
drwxr-xr-x    2 root     root           0 Oct 21 13:02 tmp
drwxr-xr-x    5 root     root           0 Oct 21 13:02 usr
drwxr-xr-x    3 root     root           0 Jun 18 12:44 var
~ # echo "AAAAAAAA?"
AAAAAAAA?
~ # 

rr-file is produced, which can be used for replaying without rrsnapshot-option:

$ qemu-system-x86_64 \
  -cpu SandyBridge -smp 1 \
  -serial stdio -display none \
  -m 4096 \
  -drive file=./empty.qcow2,id=rr \
  -kernel ./boot/vmlinuz-lts \
  -initrd ./boot/initramfs-lts  .
  -monitor telnet::12345,server,nowait \
  -append "console=ttyS0 root=/dev/ram0 alpine_dev=cdrom:iso9660 modules=loop,squashfs,sd-mod,usb-storage quiet" \
  -icount shift=auto,rrfile=rr,rr=replay

qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.tsc-deadline [bit 24]
mount: mounting /dev/ram0 on /sysroot failed: Invalid argument
Mounting root failed. 
initramfs emergency recovery shell launched. Type 'exit' to continue boot
sh: can't access tty; job control turned off
~ # ls -alh
total 32K    
drwx------   18 root     root           0 Oct 21 13:02 .
drwx------   18 root     root           0 Oct 21 13:02 ..
-rw-------    1 root     root           8 Oct 21 13:02 .ash_history
drwxr-xr-x    2 root     root           0 Jun 18 12:44 .modloop
drwxr-xr-x    2 root     root           0 Oct 21 13:02 bin
drwxr-xr-x    9 root     root        2.5K Oct 21 13:02 dev
drwxr-xr-x    4 root     root           0 Oct 21 13:02 etc
-rwxr-xr-x    1 root     root       25.9K Jun 18 12:44 init
drwxr-xr-x    5 root     root           0 Jun 18 12:44 lib
drwxr-xr-x    5 root     root           0 Jun 18 12:44 media
drwxr-xr-x    2 root     root           0 Jun 18 12:44 newroot
dr-xr-xr-x  114 root     root           0 Oct 21 13:02 proc
drwx------    2 root     root           0 Sep  4 12:53 root
drwxr-xr-x    3 root     root           0 Oct 21 13:02 run
drwxr-xr-x    2 root     root           0 Oct 21 13:02 sbin
dr-xr-xr-x   13 root     root           0 Oct 21 13:02 sys
drwxr-xr-x    2 root     root           0 Oct 21 13:02 sysroot
drwxr-xr-x    2 root     root           0 Oct 21 13:02 tmp
drwxr-xr-x    5 root     root           0 Oct 21 13:02 usr
drwxr-xr-x    3 root     root           0 Jun 18 12:44 var
~ # echo "AAAAAAAA?"
AAAAAAAA?
~ # 

As you can see, replaying emulation session works as expected. How ever, if I add the rrsnapshot-option, it gets stuck:

$ qemu-system-x86_64 \
  -cpu SandyBridge -smp 1 \
  -serial stdio -display none \
  -m 4096 \
  -drive file=./empty.qcow2,id=rr \
  -kernel ./boot/vmlinuz-lts \
  -initrd ./boot/initramfs-lts  .
  -monitor telnet::12345,server,nowait \
  -append "console=ttyS0 root=/dev/ram0 alpine_dev=cdrom:iso9660 modules=loop,squashfs,sd-mod,usb-storage quiet" \
  -icount shift=auto,rrfile=rr,rr=replay,rrsnapshot=init

qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.tsc-deadline [bit 24] 

This also can be reproduced without rrsnapshot option, by issuing loadvm init from qemu monitor:

$ telnet localhost 12345
qemu> loadvm init
...

Or, by using gdb and issuing reverse-commands that require loadvm to load previous state, like reverse-stepi or reverse-continue.

Attaching a debugger & using debug-prints shows some thread being stuck in the rcu.c, near the qemu_event_wait(&rcu_call_ready_event);. I've tried to wait for quite some time (about an hour) and there was no result.

Additional information

Qemu build. Qemu binary built from sources of 9.1.0 with --target-list=x86_64-softmmu.

Host machine. An almost clean Ubuntu 20.04 with necessary packages for building qemu from the latest release sources.

Edited by kotborealis