USB event delivery does not work correctly for macOS guests with XHCI controller without MSI(-X)

While implementing support for the 'vmapple' aarch64 machine type for running macOS/arm64 as a QEMU guest, I've been struggling with USB event delivery. The guest would only process HID events periodically, after what seemed like a timeout had elapsed.

Host environment

Operating system: Any
OS/kernel version: N/A
Architecture: x86-64, aarch64
QEMU flavor: qemu-system-x86_64, qemu-system-aarch64
QEMU version: 9.2.0-rc2
QEMU command line: On x86-64, use: -device nec-usb-xhci,id=xhci,msi=off,msix=off to define the USB controller. On arm64 use the linked (unmerged as of 9.2) vmapple machine type.

'vmapple' patch set: https://patchew.org/QEMU/20241129152506.59390-1-phil@philjordan.eu/

Emulated/Virtualized environment

Operating system: macOS
OS/kernel version: 12 (others also affected)
Architecture: x86-64, aarch64

Steps to reproduce

Get a macOS VM working. Either on x86-64 with a Q35 machine type, AppleSMC device, and OpenCore bootloader, or on aarch64 using the patch set and instructions linked above.
On x86-64, switch to a NEC XHCI controller with MSI and MSI-X support forcibly disabled: -device nec-usb-xhci,id=xhci,msi=off,msix=off
Boot macOS.

USB events are now extremely laggy. A USB keyboard or mouse becomes almost unusable.

While narrowing down the problem, I established the following facts by experimentation, tracing, and code inspection:

Although the vmapple platform uses an emulated XHCI PCI device for connecting virtual USB devices, it does not support message-signalled interrupts, in either the MSI or MSI-X persuasion. (This is true in Apple's implementation as well, but the macOS guest's XHCI driver unsurprisingly does work with Apple's PCI/XHCI implementation.)
macOS guests (and the iBoot bootloader) appear to refuse to drive XHCI controllers with numintrs < 4, for both aarch64 and x86-64 architectures. They will generally set up event rings 0, 1, and 2.
QEMU's PCI XHCI implementation does not appear to implement (as of 9.2.0-rc2) any mitigations for when the controller is used in pin-based IRQ mode. It will happily attempt to use event rings >0 in this case, but interrupts are dropped.
Linux and FreeBSD guests appear to use only interrupter 0 anyway, so these are not useful references.

It's not entirely clear to me what component is ultimately responsible for the failure here - I suspect there might be some not-quite-right behaviour in both macOS's XHCI driver and Qemu's XHCI implementation, and that these conspire to a non-functional setup.

Research

To better understand the problem, I ended up reading parts of the XHCI specification (1.2) and I found the following mentions of the subject of non-MSI(-X) operation. In my mind, some of this leaves quite a lot unspecified, see comments:

4.17 Interrupters, IMPLEMENTATION NOTE

[…] Interrupters and PCI Interrupt Mechanisms When the PCI Pin Interrupt is activated: • Interrupter 0 may assert the INTx# pin. • Interrupters 1 to MaxIntrs-1 shall be disabled.

What exactly does "disabled" mean here? Should the IMAN_IE (interrupt enable) flag be forced off in the interrupter's IMAN register?

4.17.1 Interrupter Mapping

[…] If the Number of Interrupters (MaxIntrs) field is greater than 1, then Interrupter Mapping shall be supported. […] If Interrupter Mapping is not supported, the Interrupter Target field shall be ignored by the xHC and all Events targeted at Interrupter 0.

So, the device doesn't necessarily know whether it will be driven with MSI-X support or if it will need to make do with a legacy PCI IRQ pin. So I think advertising MaxIntrs = 16 is valid. But at the same time, if the driver - whether that be due to lack of system support, or another reason - does not initialise the device with MSI(-X), then should we be following the If Interrupter Mapping is not supported reasoning?

4.17.2 Interrupt Moderation

[…] If the PCI Interrupt Pin mechanism is enabled, then the assertion of Interrupt Pending (IP) asserts the appropriate PCI INTx# pin. And the IP flag is cleared by software writing the IMAN register.

4.17.3 Interrupt Pin Support

PCI Interrupt Pins are optional. Four Interrupt Pins are supported by PCI, however PCI only allows one Interrupt Pin to be assigned to a single PCI Function. If an xHC implementation supports a PCI INTx# interrupt pin, xHC asserts its INTx# line when requesting attention from its device driver unless the xHC is enabled to use Message Signaled Interrupts (MSI, i.e. the MSI M essage Control MSI Enable or MSI-X Message Control MSI-X Enable flags are true) (refer to Sections 5.2.8.1 and 5.2.8.2 for more information). Once the INTx# signal is asserted, it remains asserted until the device driver clears the Interrupt Pending (IP) flag. When Interrupt Pending (IP) is cleared, the device deasserts its INTx# signal.

If Interrupt Pin support is enabled, then only Interrupter 0 is enabled and any other Interrupters are disabled.

Same wording as earlier, without clarification of what exactly "disabled" means.

The Interrupt Pin register in the PCI Configuration Space Header (refer to Interrupt Pin description in section 6.2.4 of the PCI specification) identifies which interrupt pin the device (or device function) uses. A value of 1 corresponds to INTA#, 2 corresponds to INTB#, and so on. If the xHC implementation does not use an interrupt pin it shall declare a ‘0’ in this register.

4.17.5 Interrupt Blocking

[…] The Interrupt Pending (IP) flag of an Interrupter shall be managed as follows: […]

If MSI or MSI-X interrupts are enabled, IP shall be cleared to ‘0’ automatically when the PCI Dword write generated by the Interrupt assertion is complete.

If PCI Pin Interrupts are enabled then, IP shall be cleared to ‘0’ by software.

This part appears to be implemented correctly by QEMU - the return value of the relevant raise_intr function indicates whether an MSI(-X) message was sent and IP should thus be reset automatically, or not. The xhci_runtime_write function meanwhile resets IP when written:

        if (val & IMAN_IP) {
            intr->iman &= ~IMAN_IP;
        }

Observations and possible solutions

I don't think the spec sufficiently addresses the pin-based interrupt case.

It repeatedly talks about "enabling" or "disabling" interrupters 1+, without explaining what that means.
The device itself can't know at the time when HCSPARAMS1 is read whether it will run in MSI(-X) or IRQ pin mode, so returning maxintrs>1 is reasonable.
The section on interrupter mapping support is interesting but stops short of properly specifying the situation, because it only does so in terms of maxintrs. How can any device ensure mapping "shall be supported" in the absence of MSI(-X) at runtime?
Hypothetically, the guest could activate or even de-activate MSI(-X) subsequent to initialising the device. (e.g. during firmware -> OS hand-off?) Absolute statements about whether the device "supports" interrupt mapping don't make sense in this context.

Overall, my impression is that:

macOS's XHCI driver should probably accept devices for which maxintrs = 1 is set. (Though this would only be a sensible workaround in a VM, not a physical device that supports MSI-X, but MSI(-X) support happens to not work properly on the host system.)
macOS's XHCI driver should probably not attempt to use event rings >0 when using pin-based interrupts.
QEMU should probably "disable" interrupters 1 through maxintrs-1 when using pin-based mode and fall back to some more sensible behaviour than attempting to deliver events on rings >0.

I'm not entirely convinced that the behaviour of macOS's XHCI driver is actually against the spec, although it's certainly questionable. And it's perhaps not great that QEMU should need modification to work with guests with questionable behaviour, but it wouldn't be the first time.

Solution

I tried a bunch of different things: ignoring writes to runtime registers for v >= 1, etc. Ultimately, the only thing that made everything suddenly spring into life was to special-case pin-based mode in xhci_event and override the argument so that v = 0; in that case. This is interpreting "If Interrupter Mapping is not supported, the Interrupter Target field shall be ignored by the xHC and all Events targeted at Interrupter 0." from section 4.17.1 to extend to pin-based interrupt mode.

That leaves the question of detecting this mode. By the time we check the return value from xhci_pci_intr_raise, it's too late: the event shouldn't have been added to an event ring other than ring 0 in the first place. At the same time, we don't really want to pollute the generic XHCI code with PCI specifics. I can see 2 options:

Add an intr_mapping_supported function pointer to XHCIState. The function returns a boolean to say whether we allow events on anything other than event ring 0. If the function pointer is NULL, we assume that if maxintrs > 1, then yes, interrupt mapping is supported. For the PCI case, the function will query msix_enabled and msi_enabled. This is a little ugly: we keep checking the PCI configuration area, via a function pointer no less, and the subsequent raise_intr call checks it again.
Track the interrupt mapping support as a boolean variable in XHCIState. The PCI XHCI device registers a custom config_write callback which detects if the MSI(-X) state has changed and updates the interrupt mapping support state in XHCIState. This could even be extended to swizzle the intr_raise/intr_update function between MSI, MSI-X, and pin-based implementations to avoid re-checking on every call.

Edited Dec 01, 2024 by Phil Dennis-Jordan