Windows crashes on resuming from sleep if hv-tlbflush is enabled
Host environment
-
Operating system: Arch Linux
-
OS/kernel version: Linux 5.18.16-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Wed, 03 Aug 2022 11:25:10 +0000 x86_64 GNU/Linux
-
Architecture: x86_64 Intel i9-12900K
-
QEMU flavor: qemu-system-x86_64
-
QEMU version: 7.0.0
-
QEMU command line: can't reproduce because right now I can't even get a clean windows install to even go to sleep without crashing, but these are the relevant options in the instance that works:
-machine q35 -smp cores=24 -cpu host,hv-vpindex,hv-tlbflush
Emulated/Virtualized environment
- Operating system: Windows 10 21H2
- OS/kernel version: -
- Architecture: x86_64
Steps to reproduce
- Boot Windows
- Tell Windows to go to sleep (observe that qemu's state switches to suspended)
- Cause windows to wake up (e.g. using the
system_wakeup
HMP command)
Description of problem
The above steps cause my Windows VM to BSOD immediately upon waking up (even before restarting the display driver in my case).
Additional information
Looking at the crash dumps always shows the "ATTEMPTED WRITE TO READONLY MEMORY" error, and always with this stack trace:
nt!KeBugCheckEx
nt!MiRaisedIrqlFault+0x1413a6
nt!MmAccessFault+0x4ef
nt!KiPageFault+0x35e
nt!MiIncreaseUsedPtesCount+0x12
nt!MiBuildForkPte+0xc6
nt!MiCloneVads+0x4ab
nt!MiCloneProcessAddressSpace+0x261
nt!MmInitializeProcessAddressSpace+0x1cb631
nt!PspAllocateProcess+0x1d13
nt!PspCreateProcess+0x242
nt!NtCreateProcessEx+0x85
nt!KiSystemServiceCopyEnd+0x25
ntdll!NtCreateProcessEx+0x14
However, the process that is being created here is always WerFault.exe
, i.e. the crash reporter. The crashing process is seemingly random. Removing hv-tlbflush
from the command line resolves the problem. Hence, my hypothesis is that due to improper TLB flushing during wakeup, a random application on the core will crash, which spawns WerFault.exe
which then immediately crashes again inside the kernel (also because of bad/stale TLB contents) and causes the BSOD. Perhaps one core wakes up first, requests a TLB flush, which is then not propagated to sleeping cores due to hv-tlbflush. Then one of those cores wakes up without the TLB flush?