Skip to content

NVMe command id changes for use-after-free CQE detection

John Meneghini requested to merge johnmeneghini/centos-stream-9:abort_fix3 into main

Bugzilla: http://bugzilla.redhat.com/2044616

We cannot detect a (buggy) controller that is sending us a completion for a request that was already completed (for example sending a completion twice). This phenomenon has been seen on customer systems, mostly in virtualization environments, with ESXi virtual nvme controllers.

To protect against this we partition the upper 4 msbs of the nvme SQE command_id and use it as a 4-bit generation counter. This generation counter is incremented on every SQE execution, and then verified on every CQE execution.

The 16-bit command_id structure now is constructed by:

    | xxxx | xxxxxxxxxxxx |
      gen    request tag

This means that we are giving up some possible queue depth as 12 bits allow for a maximum queue depth of 4095 instead of 65536, however we never create such long queues anyways so no real harm done.

Signed-off-by: John Meneghini jmeneghi@redhat.com

Merge request reports