Replay/record does not work with `rrsnapshot`/`loadvm`
Host environment
- Operating system: Ubuntu 20.04.6 LTS
- OS/kernel version: Linux ub20045.4.0-153-generic #170-Ubuntu SMP Fri Jun 16 13:43:31 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
- Architecture: x86_64
- QEMU flavor: qemu-system-x86_64
- QEMU version: 9.1.0
- QEMU command line:
$ qemu-system-x86_64 \
-cpu SandyBridge -smp 1 \
-serial stdio -display none \
-m 4096 \
-drive file=./empty.qcow2,id=rr \
-kernel ./boot/vmlinuz-lts \
-initrd ./boot/initramfs-lts .
-monitor telnet::12345,server,nowait \
-append "console=ttyS0 root=/dev/ram0 alpine_dev=cdrom:iso9660 modules=loop,squashfs,sd-mod,usb-storage quiet" \
-icount shift=auto,rrfile=rr,rr=replay,rrsnapshot=init
Emulated/Virtualized environment
- Operating system: alpine-standard-3.20.3-x86_64.iso
- OS/kernel version: -
- Architecture: x86_64
Description of problem
Qemu's record/replay feature does not properly work when using snapshots (like rrsnapshot).
Record/replay without snapshotting works just fine, but when using rrsnapshot=...
the replay is stuck at boot. loadvm
monitor command also gets qemu stuck.
Record command:
$ qemu-system-x86_64 \
-cpu SandyBridge -smp 1 \
-serial stdio -display none \
-m 4096 \
-drive file=./empty.qcow2,id=rr \
-kernel ./boot/vmlinuz-lts \
-initrd ./boot/initramfs-lts .
-monitor telnet::12345,server,nowait \
-append "console=ttyS0 root=/dev/ram0 alpine_dev=cdrom:iso9660 modules=loop,squashfs,sd-mod,usb-storage quiet" \
-icount shift=auto,rrfile=rr,rr=record,rrsnapshot=init
Broken replay command, which gets qemu stuck:
$ qemu-system-x86_64 \
-cpu SandyBridge -smp 1 \
-serial stdio -display none \
-m 4096 \
-drive file=./empty.qcow2,id=rr \
-kernel ./boot/vmlinuz-lts \
-initrd ./boot/initramfs-lts .
-monitor telnet::12345,server,nowait \
-append "console=ttyS0 root=/dev/ram0 alpine_dev=cdrom:iso9660 modules=loop,squashfs,sd-mod,usb-storage quiet" \
-icount shift=auto,rrfile=rr,rr=replay,rrsnapshot=init
qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.tsc-deadline [bit 24]
Record/replay without rrsnapshot
/loadvm
/etc works as expected.
Steps to reproduce
To reproduce i've used alpine linux kernel as the guest:
wget https://dl-cdn.alpinelinux.org/alpine/v3.20/releases/x86_64/alpine-standard-3.20.3-x86_64.iso
7z x alpine-standard-3.20.3-x86_64.iso
Prerequisites - an empty qcow2 file for snapshots:
qemu-img create -f qcow2 empty.qcow2 1G
Running an alpine linux kernel with rr=record
- works just fine, kernel boots, accepts input.
$ qemu-system-x86_64 \
-cpu SandyBridge -smp 1 \
-serial stdio -display none \
-m 4096 \
-drive file=./empty.qcow2,id=rr \
-kernel ./boot/vmlinuz-lts \
-initrd ./boot/initramfs-lts .
-monitor telnet::12345,server,nowait \
-append "console=ttyS0 root=/dev/ram0 alpine_dev=cdrom:iso9660 modules=loop,squashfs,sd-mod,usb-storage quiet" \
-icount shift=auto,rrfile=rr,rr=record,rrsnapshot=init
qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.tsc-deadline [bit 24]
mount: mounting /dev/ram0 on /sysroot failed: Invalid argument
Mounting root failed.
initramfs emergency recovery shell launched. Type 'exit' to continue boot
sh: can't access tty; job control turned off
~ # ls -alh
total 32K
drwx------ 18 root root 0 Oct 21 13:02 .
drwx------ 18 root root 0 Oct 21 13:02 ..
-rw------- 1 root root 8 Oct 21 13:02 .ash_history
drwxr-xr-x 2 root root 0 Jun 18 12:44 .modloop
drwxr-xr-x 2 root root 0 Oct 21 13:02 bin
drwxr-xr-x 9 root root 2.5K Oct 21 13:02 dev
drwxr-xr-x 4 root root 0 Oct 21 13:02 etc
-rwxr-xr-x 1 root root 25.9K Jun 18 12:44 init
drwxr-xr-x 5 root root 0 Jun 18 12:44 lib
drwxr-xr-x 5 root root 0 Jun 18 12:44 media
drwxr-xr-x 2 root root 0 Jun 18 12:44 newroot
dr-xr-xr-x 114 root root 0 Oct 21 13:02 proc
drwx------ 2 root root 0 Sep 4 12:53 root
drwxr-xr-x 3 root root 0 Oct 21 13:02 run
drwxr-xr-x 2 root root 0 Oct 21 13:02 sbin
dr-xr-xr-x 13 root root 0 Oct 21 13:02 sys
drwxr-xr-x 2 root root 0 Oct 21 13:02 sysroot
drwxr-xr-x 2 root root 0 Oct 21 13:02 tmp
drwxr-xr-x 5 root root 0 Oct 21 13:02 usr
drwxr-xr-x 3 root root 0 Jun 18 12:44 var
~ # echo "AAAAAAAA?"
AAAAAAAA?
~ #
rr
-file is produced, which can be used for replaying without rrsnapshot
-option:
$ qemu-system-x86_64 \
-cpu SandyBridge -smp 1 \
-serial stdio -display none \
-m 4096 \
-drive file=./empty.qcow2,id=rr \
-kernel ./boot/vmlinuz-lts \
-initrd ./boot/initramfs-lts .
-monitor telnet::12345,server,nowait \
-append "console=ttyS0 root=/dev/ram0 alpine_dev=cdrom:iso9660 modules=loop,squashfs,sd-mod,usb-storage quiet" \
-icount shift=auto,rrfile=rr,rr=replay
qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.tsc-deadline [bit 24]
mount: mounting /dev/ram0 on /sysroot failed: Invalid argument
Mounting root failed.
initramfs emergency recovery shell launched. Type 'exit' to continue boot
sh: can't access tty; job control turned off
~ # ls -alh
total 32K
drwx------ 18 root root 0 Oct 21 13:02 .
drwx------ 18 root root 0 Oct 21 13:02 ..
-rw------- 1 root root 8 Oct 21 13:02 .ash_history
drwxr-xr-x 2 root root 0 Jun 18 12:44 .modloop
drwxr-xr-x 2 root root 0 Oct 21 13:02 bin
drwxr-xr-x 9 root root 2.5K Oct 21 13:02 dev
drwxr-xr-x 4 root root 0 Oct 21 13:02 etc
-rwxr-xr-x 1 root root 25.9K Jun 18 12:44 init
drwxr-xr-x 5 root root 0 Jun 18 12:44 lib
drwxr-xr-x 5 root root 0 Jun 18 12:44 media
drwxr-xr-x 2 root root 0 Jun 18 12:44 newroot
dr-xr-xr-x 114 root root 0 Oct 21 13:02 proc
drwx------ 2 root root 0 Sep 4 12:53 root
drwxr-xr-x 3 root root 0 Oct 21 13:02 run
drwxr-xr-x 2 root root 0 Oct 21 13:02 sbin
dr-xr-xr-x 13 root root 0 Oct 21 13:02 sys
drwxr-xr-x 2 root root 0 Oct 21 13:02 sysroot
drwxr-xr-x 2 root root 0 Oct 21 13:02 tmp
drwxr-xr-x 5 root root 0 Oct 21 13:02 usr
drwxr-xr-x 3 root root 0 Jun 18 12:44 var
~ # echo "AAAAAAAA?"
AAAAAAAA?
~ #
As you can see, replaying emulation session works as expected. How ever, if I add the rrsnapshot
-option, it gets stuck:
$ qemu-system-x86_64 \
-cpu SandyBridge -smp 1 \
-serial stdio -display none \
-m 4096 \
-drive file=./empty.qcow2,id=rr \
-kernel ./boot/vmlinuz-lts \
-initrd ./boot/initramfs-lts .
-monitor telnet::12345,server,nowait \
-append "console=ttyS0 root=/dev/ram0 alpine_dev=cdrom:iso9660 modules=loop,squashfs,sd-mod,usb-storage quiet" \
-icount shift=auto,rrfile=rr,rr=replay,rrsnapshot=init
qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.tsc-deadline [bit 24]
This also can be reproduced without rrsnapshot
option, by issuing loadvm init
from qemu monitor:
$ telnet localhost 12345
qemu> loadvm init
...
Or, by using gdb
and issuing reverse-commands that require loadvm
to load previous state, like reverse-stepi
or reverse-continue
.
Attaching a debugger & using debug-prints shows some thread being stuck in the rcu.c
, near the qemu_event_wait(&rcu_call_ready_event);
. I've tried to wait for quite some time (about an hour) and there was no result.
Additional information
Qemu build. Qemu binary built from sources of 9.1.0 with --target-list=x86_64-softmmu
.
Host machine. An almost clean Ubuntu 20.04 with necessary packages for building qemu from the latest release sources.