guest panics when attempting to perform loadvm operation on x86_64 platform with kvm_intel ept=0
Host environment
- Operating system: Ubuntu 22.04.3 LTS
- OS/kernel version: Linux ubuntu 6.7.0-rc6 #198 SMP PREEMPT_DYNAMIC
- Architecture: x86
- QEMU flavor: qemu-system-x86_64
- QEMU version: QEMU emulator version 8.2.50 (v8.2.0-1871-g158a054c4d-dirty)
- QEMU command line:
./qemu-system-x86_64 -machine q35 \ -m 4G \ -bios ./qboot.rom \ -kernel ./vmlinux \ -initrd ./rootfs.img \ -nographic \ -append "root=/dev/ram0 rw rootfstype=ext4 nokaslr console=ttyS0 earlyprintk=serial,ttyS0 panic=30" \ -smp cpus=1 \ -enable-kvm \ -drive if=virtio,format=qcow2,file=backup.qcow2 \ -cpu host,pmu=off \
Emulated/Virtualized environment
- Operating system: Ubuntu 22.04.3 LTS
- OS/kernel version: 6.7.0-rc6
- Architecture: x86
Description of problem
The guest experiences a panic when attempting to perform the loadvm
operation after it has been running for a while on the x86_64 platform with kvm_intel ept=0
. I'm unsure if this operation is permitted or not, but it functions properly when using kvm_intel ept=1
.
Steps to reproduce
- Load the
kvm-intel
module with the parameterept=0
. - savevm
Boot the first guest using the previous command line and switch to the QEMU console to execute thesavevm
operation. After that, proceed to shutting down the guest. - loadvm
Boot the second guest using the same command line and switch to the QEMU console to execute theloadevm
operation. After that, the guest panics.
Additional information
I have performed some debugging and it seems that the issue lies in the fact that the VMM modifies the guest memory without informing the KVM module. Upon further investigation, I noticed that the loadvm
operation only restores the memory and does not execute any ioctl to modify the user memory region recorded in the KVM module.
The KVM module calls kvm_mmu_reset_context()
to unload the current EPT or SPT page table when guest system registers (CR0/CR3/CR4) are restored. However, for EPT, the EPT page table is released directly and can be reconstructed at a later stage. In contrast, for SPT, the KVM only decreases the reference count and retains the outdated SPT page table in the active list that is maintained by the KVM. As a result, this outdated SPT page table is reused later, leading to incorrect mapping.
To address this, I attempted to call kvm_arch_flush_shadow_all()
to zap all the page tables in kvm_mmu_reset_context()
, which allowed the guest to function properly with SPT after the loadvm
operation.
Therefore, I believe that QEMU should notify the KVM to clear all the page tables if the KVM is using shadow paging. However, it appears that there is no appropriate ioctl available for the VMM to achieve this.
Trace the kvm_mmu_get_page()
event and observe that only one record indicates that the outdated page table is reused instead of being recreated.
perf record -a -e kvmmmu:kvm_mmu_get_page