Skip to content

net/mlx5e: Fix mlx5e_priv_init() cleanup flow

Kamal Heib requested to merge kheib/centos-stream-9:37426 into main

JIRA: https://issues.redhat.com/browse/RHEL-37426
CVE: CVE-2024-35959

commit ecb829459a841198e142f72fadab56424ae96519
Author: Carolina Jubran cjubran@nvidia.com
Date: Tue Apr 9 22:08:15 2024 +0300

net/mlx5e: Fix mlx5e_priv_init() cleanup flow  

When mlx5e_priv_init() fails, the cleanup flow calls mlx5e_selq_cleanup which  
calls mlx5e_selq_apply() that assures that the `priv->state_lock` is held using  
lockdep_is_held().  

Acquire the state_lock in mlx5e_selq_cleanup().  

Kernel log:  
=============================  
WARNING: suspicious RCU usage  
6.8.0-rc3_net_next_841a9b5 #1 Not tainted  
-----------------------------  
drivers/net/ethernet/mellanox/mlx5/core/en/selq.c:124 suspicious rcu_dereference_protected() usage!  

other info that might help us debug this:  

rcu_scheduler_active = 2, debug_locks = 1  
2 locks held by systemd-modules/293:  

stack backtrace:  
CPU: 4 PID: 293 Comm: systemd-modules Not tainted 6.8.0-rc3_net_next_841a9b5 #1  
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014  
Call Trace:  
 <TASK>  
 dump_stack_lvl+0x8a/0xa0  
 lockdep_rcu_suspicious+0x154/0x1a0  
 mlx5e_selq_apply+0x94/0xa0 [mlx5_core]  
 mlx5e_selq_cleanup+0x3a/0x60 [mlx5_core]  
 mlx5e_priv_init+0x2be/0x2f0 [mlx5_core]  
 mlx5_rdma_setup_rn+0x7c/0x1a0 [mlx5_core]  
 rdma_init_netdev+0x4e/0x80 [ib_core]  
 ? mlx5_rdma_netdev_free+0x70/0x70 [mlx5_core]  
 ipoib_intf_init+0x64/0x550 [ib_ipoib]  
 ipoib_intf_alloc+0x4e/0xc0 [ib_ipoib]  
 ipoib_add_one+0xb0/0x360 [ib_ipoib]  
 add_client_context+0x112/0x1c0 [ib_core]  
 ib_register_client+0x166/0x1b0 [ib_core]  
 ? 0xffffffffa0573000  
 ipoib_init_module+0xeb/0x1a0 [ib_ipoib]  
 do_one_initcall+0x61/0x250  
 do_init_module+0x8a/0x270  
 init_module_from_file+0x8b/0xd0  
 idempotent_init_module+0x17d/0x230  
 __x64_sys_finit_module+0x61/0xb0  
 do_syscall_64+0x71/0x140  
 entry_SYSCALL_64_after_hwframe+0x46/0x4e  
 </TASK>  

Fixes: 8bf30be75069 ("net/mlx5e: Introduce select queue parameters")  
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>  
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>  
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>  
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>  
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>  
Link: https://lore.kernel.org/r/20240409190820.227554-8-tariqt@nvidia.com  
Signed-off-by: Jakub Kicinski <kuba@kernel.org>  

Signed-off-by: Kamal Heib kheib@redhat.com

Merge request reports