QEMU's NVMe emulator behaving not standard compliant
Host environment
- Operating system: Ubuntu 22
- OS/kernel version: 5.15.90.1-microsoft-standard-WSL2
- Architecture: x86
- QEMU flavor: qemu-system-x86_64
- QEMU version: 8.0.50
- QEMU command line:
./qemu-system-x86_64 -drive file=disk.img,if=none,format=raw,id=nvm -device nvme,serial=deadbeef,drive=nvm
Emulated/Virtualized environment
- Operating system: Custom kernel
- OS/kernel version: -
- Architecture: x86
Description of problem
QEMU's NVMe emulator behaves slightly non-conformant to the standard. For one, in the CAP.CSS register, bits 0, 6 and 7 are set. Bit 7 indicates that the NVMe Controller does not support any I/O Command Set, while bit 6 is set when the NVMe Controller supports one or more I/O Command Sets (see Figure 36 of the NVM Express® Base Specification, Revision 2.0c). This is obviously contradictory and only bit 6 (and 0) should be set. These bits are configured in hw/nvme/ctrl.c:8250. The NVMe emulator also checks whether the values of CC.IOSQES and CC.IOCQES are within the allowed range when the controller is enabled by setting CC.EN to 1. However this check should not be performed yet, as the allowed range can only be discovered after the controller is enabled, by submitting the Identify Command. This command reports the valid range in the Identify Controller Data Structure, however it requires the controller to be enabled which in turn would, at least in the current version, require valid values in CC.IOSQES and CC.IOCQES. The NVMe emulator also uses the values configured in CC.IOSQES and IO.IOCQES for the Admin Queues which, from what I understand, should not be the case. Only the I/O Queues should use these values. These checks are done in hw/nvme/ctrl.c:7199f. In the same function the values are already used to initialize the controllers cqe_size and sqe_size which should also happen at a later time.
Steps to reproduce
- Start any virtual machine with a NVMe Controller attached.
- Read the value of CAP.CSS (located in BAR0 of the PCIe NVMe Controller). This value will be contradictory.
- Follow the initialization procedure as described in section 3.5.1 of the NVM Express® Base Specification, Revision 2.0c. Do not set the values of CC.IOSQES and CC.IOCQES.
- The NVMe Controller will fail to enable when setting CC.EN to 1 by setting CC.CFS to 1 and reporting the respective trace event (pci_nvme_err_startfail_cqent_too_small and variations).