vfio/pci: Static Resizable BAR capability
The PCI Resizable BAR (ReBAR) capability is currently hidden from the VM because the protocol for interacting with the capability does not support a mechanism for the device to reject an advertised supported BAR size. However, when assigned to a VM, the act of resizing the BAR requires adjustment of host resources for the device, which absolutely can fail. Linux does not currently allow us to reserve resources for the device independent of the current usage. The only writable field within the ReBAR capability is the BAR Size register. The PCIe spec indicates that when written, the device should immediately begin to operate with the provided BAR size. The spec however also notes that software must only write values corresponding to supported sizes as indicated in the capability and control registers. Writing unsupported sizes produces undefined results. Therefore, if the hypervisor were to virtualize the capability and control registers such that the current size is the only indicated available size, then a write of anything other than the current size falls into the category of undefined behavior, where we can essentially expose the modified ReBAR capability as read-only. This may seem pointless, but users have reported that virtualizing the capability in this way not only allows guest software to expose related features as available (even if only cosmetic), but in some scenarios can resolve guest driver issues. Additionally, no regressions in behavior have been reported for this change. A caveat here is that the PCIe spec requires for compatibility that devices report support for a size in the range of 1MB to 512GB, therefore if the current BAR size falls outside that range we revert to hiding the capability. Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230505232308.2869912-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
-
I'd just like to comment that this patch changes the behaviour of AMD RX 7000 series GPUs. While tools like GPU-Z report Resizeable BAR is enabled, and the windows resources report the full BAR is available, AMD's drivers rely on the rebar capability flag to correctly program the GPU. This can be confirmed through the AMD Adrenalin Software where without this patch,
AMD SmartAccess Memory
is reported as unsupported.It has been verified that this is not simply a cosmetic change, as the windows DirectX12 API
ID3D12Device3::OpenExistingHeapFromAddress
causes the GPU driver to internally fault with out of memory errors after it has been used if this is not properly advertised to the guest. It appears this is due to this API exhausting the address space of the BAR when the heap is mapped into its address space.The detailed error reported by DirectX12 when this fault occurs is:
D3D12 ERROR: Kernel memory failure. There might be a memory leak. [ EXECUTION ERROR #834: MAP_OUTOFMEMORY_RETURN] D3D12 WARNING: Using ID3D12DebugDevice2::ReportLiveDeviceObjects with D3D12_RLDO_DETAIL will help drill into object lifetimes. [ STATE_CREATION WARNING #255: LIVE_OBJECT_SUMMARY] D3D12 WARNING: Live ID3D12Device at 0x0000023BF4429E80, Refcount: 16 [ STATE_CREATION WARNING #274: LIVE_DEVICE] D3D12 WARNING: Live ID3D12ShaderCacheSession : 2 [ STATE_CREATION WARNING #255: LIVE_OBJECT_SUMMARY] D3D12 WARNING: Live ID3D12RootSignature : 6 [ STATE_CREATION WARNING #255: LIVE_OBJECT_SUMMARY] D3D12 WARNING: Live ID3D12PipelineState : 5 [ STATE_CREATION WARNING #255: LIVE_OBJECT_SUMMARY] D3D12 WARNING: Live ID3D12Resource : 31 [ STATE_CREATION WARNING #255: LIVE_OBJECT_SUMMARY] D3D12 WARNING: Live ID3D12Heap : 11 [ STATE_CREATION WARNING #255: LIVE_OBJECT_SUMMARY] D3D12 WARNING: Live ID3D12CommandQueue : 4 [ STATE_CREATION WARNING #255: LIVE_OBJECT_SUMMARY] D3D12 WARNING: Live ID3D12Fence : 10 [ STATE_CREATION WARNING #255: LIVE_OBJECT_SUMMARY] D3D12 WARNING: Live ID3D12CommandAllocator : 34 [ STATE_CREATION WARNING #255: LIVE_OBJECT_SUMMARY] D3D12 WARNING: Live ID3D12GraphicsCommandList : 6 [ STATE_CREATION WARNING #255: LIVE_OBJECT_SUMMARY] D3D12: Removing Device.
Edited by Geoffrey McRae -
mentioned in issue #703