nVMX: QEMU does not clear nVMX state through KVM(L0) when guest(L2) trigger a reboot event through I/O-Port(0xCF9)
Host environment
- Operating system: (Ubuntu20.04)
- OS/kernel version: (5.10+)
- Architecture: (x86)
- QEMU flavor: (qemu-system-x86_64)
- QEMU version: (6.0.0)
- QEMU command line:
qemu-system-x86_64 -netdev user,id=net0,hostfwd=tcp::5555-:5555,hostfwd=tcp::5554-:5554 -device e1000,netdev=net0,bus=pcie.0,addr=0xA -display gtk,gl=on -device virtio-gpu-pci -m 4G -smp 1 -drive if=pflash,format=raw,file=/home/test/rc_578/OVMF.fd -drive file=/home/test/rc_578/image.qcow2,if=none,id=disk1,discard=unmap,detect-zeroes=unmap -device virtio-blk-pci,drive=disk1,bootindex=1 -device intel-hda -device hda-duplex,audiodev=spk -audiodev id=spk,timer-period=5000,driver=pa,server=/run/user/1000/pulse/native,in.fixed-settings=off,out.fixed-settings=off -chardev socket,id=charserial0,path=/tmp/kernel-console,server,nowait,logfile=/tmp/serial.log -serial chardev:charserial0 -monitor stdio -M q35 -machine kernel_irqchip=on -k en-us -cpu host,-waitpkg -enable-kvm -device qemu-xhci,id=xhci,p2=8,p3=8 -device usb-mouse -device usb-kbd -device intel-iommu,device-iotlb=on,caching-mode=on -nodefaults
Emulated/Virtualized environment
- Operating system: (Linux running on top of a lightwight hypervisor(L1))
- OS/kernel version: (5.10)
- Architecture: (x86)
Description of problem
Background:
We have a lightweight Hypervisor(iKGT) which aims to monitor very limited resources and passthrough most resources to its guest. The IO-Port:0xCF9 is also passthrough to its guest, so when guest tries to trigger a reboot event(through IO-port:0xCF9), the hardware will do the platform reset directly.
We ported it to running under QEMU+KVM, then it becomes nested virtualization architecture: KVM(L0), iKGT(L1), Guest(L2).
We filed a bug to KVM and according to maintainer's comments, it seems like an issue of QEMU. Link: bugzilla-215964. So we come to QEMU community to ask for help.
Steps to reproduce
Guest(L2) write 0xCF9 to trigger a platform reboot.
Expected result:
KVM perform a virtual platform reset and reboot guest.
Current result:
It seems QEMU/KVM only reset part of the vCPU(L2), but it does not clear the nVMX state, it still tries to emulate VMExit to iKGT(L1). We still can observe VMExit from iKGT(L1) and the exit reason is not expected.