Skip to content

net/mlx5: Discard command completions in internal error

Kamal Heib requested to merge kheib/centos-stream-9:44237 into main

JIRA: https://issues.redhat.com/browse/RHEL-44237
CVE: CVE-2024-38555

commit db9b31aa9bc56ff0d15b78f7e827d61c4a096e40
Author: Akiva Goldberger agoldberger@nvidia.com
Date: Thu May 9 14:29:51 2024 +0300

net/mlx5: Discard command completions in internal error  

Fix use after free when FW completion arrives while device is in  
internal error state. Avoid calling completion handler in this case,  
since the device will flush the command interface and trigger all  
completions manually.  

Kernel log:  
------------[ cut here ]------------  
refcount_t: underflow; use-after-free.  
...  
RIP: 0010:refcount_warn_saturate+0xd8/0xe0  
...  
Call Trace:  
<IRQ>  
? __warn+0x79/0x120  
? refcount_warn_saturate+0xd8/0xe0  
? report_bug+0x17c/0x190  
? handle_bug+0x3c/0x60  
? exc_invalid_op+0x14/0x70  
? asm_exc_invalid_op+0x16/0x20  
? refcount_warn_saturate+0xd8/0xe0  
cmd_ent_put+0x13b/0x160 [mlx5_core]  
mlx5_cmd_comp_handler+0x5f9/0x670 [mlx5_core]  
cmd_comp_notifier+0x1f/0x30 [mlx5_core]  
notifier_call_chain+0x35/0xb0  
atomic_notifier_call_chain+0x16/0x20  
mlx5_eq_async_int+0xf6/0x290 [mlx5_core]  
notifier_call_chain+0x35/0xb0  
atomic_notifier_call_chain+0x16/0x20  
irq_int_handler+0x19/0x30 [mlx5_core]  
__handle_irq_event_percpu+0x4b/0x160  
handle_irq_event+0x2e/0x80  
handle_edge_irq+0x98/0x230  
__common_interrupt+0x3b/0xa0  
common_interrupt+0x7b/0xa0  
</IRQ>  
<TASK>  
asm_common_interrupt+0x22/0x40  

Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling")  
Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>  
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>  
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>  
Link: https://lore.kernel.org/r/20240509112951.590184-6-tariqt@nvidia.com  
Signed-off-by: Jakub Kicinski <kuba@kernel.org>  

Signed-off-by: Kamal Heib kheib@redhat.com

Merge request reports