RISC-V: Instruction fetch exceptions can have invalid tval/epc combination
Host environment
- Operating system: NixOS (unstable)
- OS/kernel version: Linux 5.19.0
- Architecture: x86_64
- QEMU flavor: qemu-system-riscv64
- QEMU version: 7.0.0 or master (a6b1c53e)
- QEMU command line:
qemu-system-riscv64 -m 512M -M virt -nographic -kernel Image -append "earlycon=sbi" -initrd initrd.cpio -d int -D log.txt
Emulated/Virtualized environment
- Operating system: Linux 5.19
- OS/kernel version: Linux 5.19
- Architecture: riscv64
Description of problem
Instruction page fault / guest-page fault / access fault exceptions can have invalid epc
/tval
combinations, for example as shown in the debug log:
riscv_cpu_do_interrupt: hart:0, async:0, cause:0000000000000014, epc:0xffffffff802fec76, tval:0xffffffff802ff000, desc=guest_exec_page_fault
riscv_cpu_do_interrupt: hart:0, async:0, cause:0000000000000014, epc:0xffffffff80243fe6, tval:0xffffffff80244000, desc=guest_exec_page_fault
From the privileged spec:
If
mtval
is written with a nonzero value when an instruction access-fault or page-fault exception occurs on a system with variable-length instructions, thenmtval
will contain the virtual address of the portion of the instruction that caused the fault, whilemepc
will point to the beginning of the instruction.
Currently RISC-V only has 32-bit and 16-bit instructions, so the difference tval - epc
should be either 0
or 2
. In the examples above the differences are 906
and 26
respectively.
Possibly notable: all occurrences of these invalid combinations to have tval
aligned to a page-boundary.
Steps to reproduce
This one only gives invalid tval
/epc
combinations with instruction guest-page faults, but I've found it to be the easiest reproducer to describe, since presumably running KVM in RISC-V QEMU is a standard setup. I have not otherwise been able to find a more minimal case.
- Start a QEMU-based
riscv64
machine - Start a KVM-based virtual machine with QEMU inside it
- Do some stuff in the KVM-based virtual machine to increase the chance of page faults
- Look in the debug log of the outer QEMU for
guest_exec_page_fault
exceptions withtval
ending in000
, butepc
ending in neither000
norffe
Everything in both layers of guests should otherwise work without issue, but other/future software that relies on the spec-mandated relationship of epc
/tval
may break.