AMD IOMMU does not properly wrap command queue
## Host environment
- Operating system: Linux (Slackware)
- OS/kernel version: 6.19.7
- Architecture: x86_64
- QEMU flavor: qemu-system-x86_64
- QEMU version: 10.2.1
- QEMU command line: `qemu-system-x86_64 -cpu host,vmx -M q35,accel=kvm,kernel-irqchip=split -enable-kvm -device amd-iommu,dma-remap=on,intremap=on,xtsup=on`
## Emulated/Virtualized environment
- Operating system: None>
- OS/kernel version: Custom OS
- Architecture: x86_64
## Description of problem
QEMU is asked to emulate an AMD IOMMU. It exposes an AMD IOMMU at 0xfed80000. The custom OS initializes the IOMMU and thereby sets MMIO register CMD_BUF_BASE (0x8) = 0x80000007ac80000. This configures ComLen=1000b (256 entries). After the IOMMU is enabled and 256 commands have been put in the command buffer, the CMD_HEAD register (0x2000) remains stuck at CmdHeadPtr[18:4] = 256. Note that 256 is an invalid value; valid CmdHeadPtr values are [0,255] and 256 is supposed to wrap back to 0.
## Steps to reproduce
1. Initialize CMD_BUF_BASE with ComLen=8 (256 entries)
2. Put more than 256 INVALIDATE_IOMMU_ALL commands into the command buffer one-by-one as follows:
- put one command into the buffer
- advance the tail pointer
- wait for head == tail indicating IOMMU has consumed the command
3. The head == tail completion wait succeeds for the first 255 commands, proving the command buffer is running and IOMMU consuming the command. At command 256, the buffer gets stuck because head (256) == tail (0) never becomes true. This suggests there is a bug in the wraparound logic in QEMU.
## Additional information
Compared the behavior in QEMU vs. bare-metal AMD hardware to confirm QEMU is misbehaving:
```
in QEMU on bare-metal
CMDQ.HEAD=252 CMDQ.HEAD=252
CMDQ.HEAD=253 CMDQ.HEAD=253
CMDQ.HEAD=254 CMDQ.HEAD=254
CMDQ.HEAD=255 CMDQ.HEAD=255
CMDQ.HEAD=256 CMDQ.HEAD=0
completion timeout CMDQ.HEAD=1
```
<!--
The line below ensures that proper tags are added to the issue.
Please do not remove it.
-->
issue