1. 27 Feb, 2019 1 commit
    • Leon Romanovsky's avatar
      RDMA/mthca: Clear QP objects during their allocation · f7a43c65
      Leon Romanovsky authored
      [ Upstream commit 9d9f59b4 ]
      
      As part of audit process to update drivers to use rdma_restrack_add()
      ensure that QP objects is cleared before access. Such change fixes the
      crash observed with uninitialized non zero sgid attr accessed by
      ib_destroy_qp().
      
      CPU: 3 PID: 74 Comm: kworker/u16:1 Not tainted 4.19.10-300.fc29.x86_64
      Workqueue: ipoib_wq ipoib_cm_tx_reap [ib_ipoib]
      RIP: 0010:rdma_put_gid_attr+0x9/0x30 [ib_core]
      RSP: 0018:ffffb7ad819dbde8 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: ffff8d1bdf5a2e00 RCX: 0000000000002699
      RDX: 206c656e72656af8 RSI: ffff8d1bf7ae6160 RDI: 206c656e72656b20
      RBP: 0000000000000000 R08: 0000000000026160 R09: ffffffffc06b45bf
      R10: ffffe849887da000 R11: 0000000000000002 R12: ffff8d1be30cb400
      R13: ffff8d1bdf681800 R14: ffff8d1be2272400 R15: ffff8d1be30ca000
      FS:  0000000000000000(0000) GS:ffff8d1bf7ac0000(0000)
      knlGS:0000000000000000
      Trace:
       ib_destroy_qp+0xc9/0x240 [ib_core]
       ipoib_cm_tx_reap+0x1f9/0x4e0 [ib_ipoib]
       process_one_work+0x1a1/0x3a0
       worker_thread+0x30/0x380
       ? pwq_unbound_release_workfn+0xd0/0xd0
       kthread+0x112/0x130
       ? kthread_create_worker_on_cpu+0x70/0x70
       ret_from_fork+0x22/0x40
      Reported-by: Alexander Murashkin's avatarAlexander Murashkin <AlexanderMurashkin@msn.com>
      Tested-by: Alexander Murashkin's avatarAlexander Murashkin <AlexanderMurashkin@msn.com>
      Fixes: 1a1f460f ("RDMA: Hold the sgid_attr inside the struct ib_ah/qp")
      Signed-off-by: 's avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: 's avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: 's avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      f7a43c65
  2. 12 Feb, 2019 1 commit
  3. 06 Feb, 2019 1 commit
  4. 26 Jan, 2019 1 commit
    • Parvi Kaustubhi's avatar
      IB/usnic: Fix potential deadlock · c0ce3a40
      Parvi Kaustubhi authored
      [ Upstream commit 8036e90f ]
      
      Acquiring the rtnl lock while holding usdev_lock could result in a
      deadlock.
      
      For example:
      
      usnic_ib_query_port()
      | mutex_lock(&us_ibdev->usdev_lock)
       | ib_get_eth_speed()
        | rtnl_lock()
      
      rtnl_lock()
      | usnic_ib_netdevice_event()
       | mutex_lock(&us_ibdev->usdev_lock)
      
      This commit moves the usdev_lock acquisition after the rtnl lock has been
      released.
      
      This is safe to do because usdev_lock is not protecting anything being
      accessed in ib_get_eth_speed(). Hence, the correct order of holding locks
      (rtnl -> usdev_lock) is not violated.
      Signed-off-by: 's avatarParvi Kaustubhi <pkaustub@cisco.com>
      Signed-off-by: 's avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      c0ce3a40
  5. 22 Jan, 2019 1 commit
  6. 09 Jan, 2019 1 commit
  7. 06 Dec, 2018 1 commit
  8. 03 Dec, 2018 3 commits
    • Artemy Kovalyov's avatar
      IB/mlx5: Fix implicit ODP interrupted page fault · 37b06e50
      Artemy Kovalyov authored
      Since any page fault may be interrupted by a MMU invalidation and implicit
      leaf MR may be released during this process. The check for parent value
      is unreliable condition for an implicit MR.
      Use other condition that we can rely on to determine if MR is implicit.
      
      Fixes: b4cfe447 ("IB/mlx5: Implement on demand paging by adding support for MMU notifiers")
      Signed-off-by: Artemy Kovalyov's avatarArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: 's avatarMoni Shoua <monis@mellanox.com>
      Signed-off-by: 's avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: 's avatarDoug Ledford <dledford@redhat.com>
      37b06e50
    • Piotr Stankiewicz's avatar
      IB/hfi1: Fix an out-of-bounds access in get_hw_stats · 36d84219
      Piotr Stankiewicz authored
      When running with KASAN, the following trace is produced:
      
      [   62.535888]
      
      ==================================================================
      [   62.544930] BUG: KASAN: slab-out-of-bounds in
      gut_hw_stats+0x122/0x230 [hfi1]
      [   62.553856] Write of size 8 at addr ffff88080e8d6330 by task
      kworker/0:1/14
      
      [   62.565333] CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted
      4.19.0-test-build-kasan+ #8
      [   62.575087] Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS
      SE5C610.86B.01.01.0019.101220160604 10/12/2016
      [   62.587951] Workqueue: events work_for_cpu_fn
      [   62.594050] Call Trace:
      [   62.598023]  dump_stack+0xc6/0x14c
      [   62.603089]  ? dump_stack_print_info.cold.1+0x2f/0x2f
      [   62.610041]  ? kmsg_dump_rewind_nolock+0x59/0x59
      [   62.616615]  ? get_hw_stats+0x122/0x230 [hfi1]
      [   62.622985]  print_address_description+0x6c/0x23c
      [   62.629744]  ? get_hw_stats+0x122/0x230 [hfi1]
      [   62.636108]  kasan_report.cold.6+0x241/0x308
      [   62.642365]  get_hw_stats+0x122/0x230 [hfi1]
      [   62.648703]  ? hfi1_alloc_rn+0x40/0x40 [hfi1]
      [   62.655088]  ? __kmalloc+0x110/0x240
      [   62.660695]  ? hfi1_alloc_rn+0x40/0x40 [hfi1]
      [   62.667142]  setup_hw_stats+0xd8/0x430 [ib_core]
      [   62.673972]  ? show_hfi+0x50/0x50 [hfi1]
      [   62.680026]  ib_device_register_sysfs+0x165/0x180 [ib_core]
      [   62.687995]  ib_register_device+0x5a2/0xa10 [ib_core]
      [   62.695340]  ? show_hfi+0x50/0x50 [hfi1]
      [   62.701421]  ? ib_unregister_device+0x2e0/0x2e0 [ib_core]
      [   62.709222]  ? __vmalloc_node_range+0x2d0/0x380
      [   62.716131]  ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt]
      [   62.723735]  ? vmalloc_node+0x5c/0x70
      [   62.729697]  ? rvt_driver_mr_init+0x11f/0x2d0 [rdmavt]
      [   62.737347]  ? rvt_driver_mr_init+0x1f5/0x2d0 [rdmavt]
      [   62.744998]  ? __rvt_alloc_mr+0x110/0x110 [rdmavt]
      [   62.752315]  ? rvt_rc_error+0x140/0x140 [rdmavt]
      [   62.759434]  ? rvt_vma_open+0x30/0x30 [rdmavt]
      [   62.766364]  ? mutex_unlock+0x1d/0x40
      [   62.772445]  ? kmem_cache_create_usercopy+0x15d/0x230
      [   62.780115]  rvt_register_device+0x1f6/0x360 [rdmavt]
      [   62.787823]  ? rvt_get_port_immutable+0x180/0x180 [rdmavt]
      [   62.796058]  ? __get_txreq+0x400/0x400 [hfi1]
      [   62.802969]  ? memcpy+0x34/0x50
      [   62.808611]  hfi1_register_ib_device+0xde6/0xeb0 [hfi1]
      [   62.816601]  ? hfi1_get_npkeys+0x10/0x10 [hfi1]
      [   62.823760]  ? hfi1_init+0x89f/0x9a0 [hfi1]
      [   62.830469]  ? hfi1_setup_eagerbufs+0xad0/0xad0 [hfi1]
      [   62.838204]  ? pcie_capability_clear_and_set_word+0xcd/0xe0
      [   62.846429]  ? pcie_capability_read_word+0xd0/0xd0
      [   62.853791]  ? hfi1_pcie_init+0x187/0x4b0 [hfi1]
      [   62.860958]  init_one+0x67f/0xae0 [hfi1]
      [   62.867301]  ? hfi1_init+0x9a0/0x9a0 [hfi1]
      [   62.873876]  ? wait_woken+0x130/0x130
      [   62.879860]  ? read_word_at_a_time+0xe/0x20
      [   62.886329]  ? strscpy+0x14b/0x280
      [   62.891998]  ? hfi1_init+0x9a0/0x9a0 [hfi1]
      [   62.898405]  local_pci_probe+0x70/0xd0
      [   62.904295]  ? pci_device_shutdown+0x90/0x90
      [   62.910833]  work_for_cpu_fn+0x29/0x40
      [   62.916750]  process_one_work+0x584/0x960
      [   62.922974]  ? rcu_work_rcufn+0x40/0x40
      [   62.928991]  ? __schedule+0x396/0xdc0
      [   62.934806]  ? __sched_text_start+0x8/0x8
      [   62.941020]  ? pick_next_task_fair+0x68b/0xc60
      [   62.947674]  ? run_rebalance_domains+0x260/0x260
      [   62.954471]  ? __list_add_valid+0x29/0xa0
      [   62.960607]  ? move_linked_works+0x1c7/0x230
      [   62.967077]  ?
      trace_event_raw_event_workqueue_execute_start+0x140/0x140
      [   62.976248]  ? mutex_lock+0xa6/0x100
      [   62.982029]  ? __mutex_lock_slowpath+0x10/0x10
      [   62.988795]  ? __switch_to+0x37a/0x710
      [   62.994731]  worker_thread+0x62e/0x9d0
      [   63.000602]  ? max_active_store+0xf0/0xf0
      [   63.006828]  ? __switch_to_asm+0x40/0x70
      [   63.012932]  ? __switch_to_asm+0x34/0x70
      [   63.019013]  ? __switch_to_asm+0x40/0x70
      [   63.025042]  ? __switch_to_asm+0x34/0x70
      [   63.031030]  ? __switch_to_asm+0x40/0x70
      [   63.037006]  ? __schedule+0x396/0xdc0
      [   63.042660]  ? kmem_cache_alloc_trace+0xf3/0x1f0
      [   63.049323]  ? kthread+0x59/0x1d0
      [   63.054594]  ? ret_from_fork+0x35/0x40
      [   63.060257]  ? __sched_text_start+0x8/0x8
      [   63.066212]  ? schedule+0xcf/0x250
      [   63.071529]  ? __wake_up_common+0x110/0x350
      [   63.077794]  ? __schedule+0xdc0/0xdc0
      [   63.083348]  ? wait_woken+0x130/0x130
      [   63.088963]  ? finish_task_switch+0x1f1/0x520
      [   63.095258]  ? kasan_unpoison_shadow+0x30/0x40
      [   63.101792]  ? __init_waitqueue_head+0xa0/0xd0
      [   63.108183]  ? replenish_dl_entity.cold.60+0x18/0x18
      [   63.115151]  ? _raw_spin_lock_irqsave+0x25/0x50
      [   63.121754]  ? max_active_store+0xf0/0xf0
      [   63.127753]  kthread+0x1ae/0x1d0
      [   63.132894]  ? kthread_bind+0x30/0x30
      [   63.138422]  ret_from_fork+0x35/0x40
      
      [   63.146973] Allocated by task 14:
      [   63.152077]  kasan_kmalloc+0xbf/0xe0
      [   63.157471]  __kmalloc+0x110/0x240
      [   63.162804]  init_cntrs+0x34d/0xdf0 [hfi1]
      [   63.168883]  hfi1_init_dd+0x29a3/0x2f90 [hfi1]
      [   63.175244]  init_one+0x551/0xae0 [hfi1]
      [   63.181065]  local_pci_probe+0x70/0xd0
      [   63.186759]  work_for_cpu_fn+0x29/0x40
      [   63.192310]  process_one_work+0x584/0x960
      [   63.198163]  worker_thread+0x62e/0x9d0
      [   63.203843]  kthread+0x1ae/0x1d0
      [   63.208874]  ret_from_fork+0x35/0x40
      
      [   63.217203] Freed by task 1:
      [   63.221844]  __kasan_slab_free+0x12e/0x180
      [   63.227844]  kfree+0x92/0x1a0
      [   63.232570]  single_release+0x3a/0x60
      [   63.238024]  __fput+0x1d9/0x480
      [   63.242911]  task_work_run+0x139/0x190
      [   63.248440]  exit_to_usermode_loop+0x191/0x1a0
      [   63.254814]  do_syscall_64+0x301/0x330
      [   63.260283]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [   63.270199] The buggy address belongs to the object at
      ffff88080e8d5500
       which belongs to the cache kmalloc-4096 of size 4096
      [   63.287247] The buggy address is located 3632 bytes inside of
       4096-byte region [ffff88080e8d5500, ffff88080e8d6500)
      [   63.303564] The buggy address belongs to the page:
      [   63.310447] page:ffffea00203a3400 count:1 mapcount:0
      mapping:ffff88081380e840 index:0x0 compound_mapcount: 0
      [   63.323102] flags: 0x2fffff80008100(slab|head)
      [   63.329775] raw: 002fffff80008100 0000000000000000 0000000100000001
      ffff88081380e840
      [   63.340175] raw: 0000000000000000 0000000000070007 00000001ffffffff
      0000000000000000
      [   63.350564] page dumped because: kasan: bad access detected
      
      [   63.361974] Memory state around the buggy address:
      [   63.369137]  ffff88080e8d6200: 00 00 00 00 00 00 00 00 00 00 00 00 00
      00 00 00
      [   63.379082]  ffff88080e8d6280: 00 00 00 00 00 00 00 00 00 00 00 00 00
      00 00 00
      [   63.389032] >ffff88080e8d6300: 00 00 00 00 00 00 fc fc fc fc fc fc fc
      fc fc fc
      [   63.398944]                                      ^
      [   63.406141]  ffff88080e8d6380: fc fc fc fc fc fc fc fc fc fc fc fc fc
      fc fc fc
      [   63.416109]  ffff88080e8d6400: fc fc fc fc fc fc fc fc fc fc fc fc fc
      fc fc fc
      [   63.426099]
      ==================================================================
      
      The trace happens because get_hw_stats() assumes there is room in the
      memory allocated in init_cntrs() to accommodate the driver counters.
      Unfortunately, that routine only allocated space for the device
      counters.
      
      Fix by insuring the allocation has room for the additional driver
      counters.
      
      Cc: <Stable@vger.kernel.org> # v4.14+
      Fixes: b7481944 ("IB/hfi1: Show statistics counters under IB stats interface")
      Reviewed-by: 's avatarMike Marciniczyn <mike.marciniszyn@intel.com>
      Reviewed-by: 's avatarMike Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: 's avatarPiotr Stankiewicz <piotr.stankiewicz@intel.com>
      Signed-off-by: 's avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: 's avatarDoug Ledford <dledford@redhat.com>
      36d84219
    • Michael J. Ruhl's avatar
      IB/hfi1: Fix a latency issue for small messages · 90b2620e
      Michael J. Ruhl authored
      A recent performance enhancement introduced a latency issue in the
      HFI message path.  The new algorithm removed a forced call send for
      PIO messages and added a forced schedule event for messages larger
      than the MTU.
      
      For PIO, the schedule path can introduce thrashing that can
      significantly impact the throughput for small messages.
      
      If a message size is within the PIO threshold, always take the send
      path.
      
      Fixes: 0b79b277 ("IB/{hfi1, qib, rdmavt}: Schedule multi RC/UC packets instead of posting")
      Reviewed-by: 's avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: 's avatarMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: 's avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: 's avatarDoug Ledford <dledford@redhat.com>
      90b2620e
  9. 29 Nov, 2018 1 commit
  10. 26 Nov, 2018 2 commits
  11. 23 Nov, 2018 1 commit
  12. 21 Nov, 2018 5 commits
    • Michael Guralnik's avatar
      IB/mlx5: Avoid load failure due to unknown link width · db7a691a
      Michael Guralnik authored
      If the firmware reports a connection width that is not 1x, 4x, 8x or 12x
      it causes the driver to fail during initialization.
      
      To prevent this failure every time a new width is introduced to the RDMA
      stack, we will set a default 4x width for these widths which ar unknown to
      the driver.
      
      This is needed to allow to run old kernels with new firmware.
      
      Cc: <stable@vger.kernel.org> # 4.1
      Fixes: 1b5daf11 ("IB/mlx5: Avoid using the MAD_IFC command under ISSI > 0 mode")
      Signed-off-by: 's avatarMichael Guralnik <michaelgur@mellanox.com>
      Reviewed-by: 's avatarMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: 's avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: 's avatarJason Gunthorpe <jgg@mellanox.com>
      db7a691a
    • Yonatan Cohen's avatar
      IB/mlx5: Fix XRC QP support after introducing extended atomic · 13f8d9c1
      Yonatan Cohen authored
      Extended atomics are supported with RC and XRC QP types, but the commit
      citied in the Fixes line added an unneeded check to
      to_mlx5_access_flags. This broke XRC QPs.
      
      The following ib_atomic_bw invocation over XRC reproduces the issue:
         ib_atomic_bw -d mlx5_1 --connection=XRC --atomic_type=FETCH_AND_ADD
      
      It is safe to remove such checks because the QP type was already checked
      in ib_modify_qp_is_ok(), which was previously called from
      mlx5_ib_modify_qp.
      
      Fixes: a60109dc ("IB/mlx5: Add support for extended atomic operations")
      Signed-off-by: 's avatarYonatan Cohen <yonatanc@mellanox.com>
      Signed-off-by: 's avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: 's avatarJason Gunthorpe <jgg@mellanox.com>
      13f8d9c1
    • Selvin Xavier's avatar
      RDMA/bnxt_re: Avoid accessing the device structure after it is freed · a6c66d6a
      Selvin Xavier authored
      When bnxt_re_ib_reg returns failure, the device structure gets
      freed. Driver tries to access the device pointer
      after it is freed.
      
      [ 4871.034744] Failed to register with netedev: 0xffffffa1
      [ 4871.034765] infiniband (null): Failed to register with IB: 0xffffffea
      [ 4871.046430] ==================================================================
      [ 4871.046437] BUG: KASAN: use-after-free in bnxt_re_task+0x63/0x180 [bnxt_re]
      [ 4871.046439] Write of size 4 at addr ffff880fa8406f48 by task kworker/u48:2/17813
      
      [ 4871.046443] CPU: 20 PID: 17813 Comm: kworker/u48:2 Kdump: loaded Tainted: G B OE  4.20.0-rc1+ #42
      [ 4871.046444] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014
      [ 4871.046447] Workqueue: bnxt_re bnxt_re_task [bnxt_re]
      [ 4871.046449] Call Trace:
      [ 4871.046454]  dump_stack+0x91/0xeb
      [ 4871.046458]  print_address_description+0x6a/0x2a0
      [ 4871.046461]  kasan_report+0x176/0x2d0
      [ 4871.046463]  ? bnxt_re_task+0x63/0x180 [bnxt_re]
      [ 4871.046466]  bnxt_re_task+0x63/0x180 [bnxt_re]
      [ 4871.046470]  process_one_work+0x216/0x5b0
      [ 4871.046471]  ? process_one_work+0x189/0x5b0
      [ 4871.046475]  worker_thread+0x4e/0x3d0
      [ 4871.046479]  kthread+0x10e/0x140
      [ 4871.046480]  ? process_one_work+0x5b0/0x5b0
      [ 4871.046482]  ? kthread_stop+0x220/0x220
      [ 4871.046486]  ret_from_fork+0x3a/0x50
      
      [ 4871.046492] The buggy address belongs to the page:
      [ 4871.046494] page:ffffea003ea10180 count:0 mapcount:0 mapping:0000000000000000 index:0x0
      [ 4871.046495] flags: 0x57ffffc0000000()
      [ 4871.046498] raw: 0057ffffc0000000 0000000000000000 ffffea003ea10188 0000000000000000
      [ 4871.046500] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      [ 4871.046501] page dumped because: kasan: bad access detected
      
      Avoid accessing the device structure once it is freed.
      
      Fixes: 497158aa ("RDMA/bnxt_re: Fix the ib_reg failure cleanup")
      Signed-off-by: Selvin Xavier's avatarSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: 's avatarJason Gunthorpe <jgg@mellanox.com>
      a6c66d6a
    • Selvin Xavier's avatar
      RDMA/bnxt_re: Fix system hang when registration with L2 driver fails · 3c4b1419
      Selvin Xavier authored
      Driver doesn't release rtnl lock if registration with
      L2 driver (bnxt_re_register_netdev) fais and this causes
      hang while requesting for the next lock.
      
      [  371.635416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  371.635417] kworker/u48:1   D    0   634      2 0x80000000
      [  371.635423] Workqueue: bnxt_re bnxt_re_task [bnxt_re]
      [  371.635424] Call Trace:
      [  371.635426]  ? __schedule+0x36b/0xbd0
      [  371.635429]  schedule+0x39/0x90
      [  371.635430]  schedule_preempt_disabled+0x11/0x20
      [  371.635431]  __mutex_lock+0x45b/0x9c0
      [  371.635433]  ? __mutex_lock+0x16d/0x9c0
      [  371.635435]  ? bnxt_re_ib_reg+0x2b/0xb30 [bnxt_re]
      [  371.635438]  ? wake_up_klogd+0x37/0x40
      [  371.635442]  bnxt_re_ib_reg+0x2b/0xb30 [bnxt_re]
      [  371.635447]  bnxt_re_task+0xfd/0x180 [bnxt_re]
      [  371.635449]  process_one_work+0x216/0x5b0
      [  371.635450]  ? process_one_work+0x189/0x5b0
      [  371.635453]  worker_thread+0x4e/0x3d0
      [  371.635455]  kthread+0x10e/0x140
      [  371.635456]  ? process_one_work+0x5b0/0x5b0
      [  371.635458]  ? kthread_stop+0x220/0x220
      [  371.635460]  ret_from_fork+0x3a/0x50
      [  371.635477] INFO: task NetworkManager:1228 blocked for more than 120 seconds.
      [  371.635478]       Tainted: G    B      OE     4.20.0-rc1+ #42
      [  371.635479] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      
      Release the rtnl_lock correctly in the failure path.
      
      Fixes: de5c95d0 ("RDMA/bnxt_re: Fix system crash during RDMA resource initialization")
      Signed-off-by: Selvin Xavier's avatarSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: 's avatarJason Gunthorpe <jgg@mellanox.com>
      3c4b1419
    • Majd Dibbiny's avatar
      RDMA/mlx5: Fix fence type for IB_WR_LOCAL_INV WR · 074fca3a
      Majd Dibbiny authored
      Currently, for IB_WR_LOCAL_INV WR, when the next fence is None, the
      current fence will be SMALL instead of Normal Fence.
      
      Without this patch krping doesn't work on CX-5 devices and throws
      following error:
      
      The error messages are from CX5 driver are: (from server side)
      [ 710.434014] mlx5_0:dump_cqe:278:(pid 2712): dump error cqe
      [ 710.434016] 00000000 00000000 00000000 00000000
      [ 710.434016] 00000000 00000000 00000000 00000000
      [ 710.434017] 00000000 00000000 00000000 00000000
      [ 710.434018] 00000000 93003204 100000b8 000524d2
      [ 710.434019] krping: cq completion failed with wr_id 0 status 4 opcode 128 vender_err 32
      
      Fixed the logic to set the correct fence type.
      
      Fixes: 6e8484c5 ("RDMA/mlx5: set UMR wqe fence according to HCA cap")
      Signed-off-by: 's avatarMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: 's avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: 's avatarJason Gunthorpe <jgg@mellanox.com>
      074fca3a
  13. 26 Oct, 2018 1 commit
  14. 18 Oct, 2018 1 commit
    • Tariq Toukan's avatar
      net/mlx5: Refactor fragmented buffer struct fields and init flow · 4972e6fa
      Tariq Toukan authored
      Take struct mlx5_frag_buf out of mlx5_frag_buf_ctrl, as it is not
      needed to manage and control the datapath of the fragmented buffers API.
      
      struct mlx5_frag_buf contains control info to manage the allocation
      and de-allocation of the fragmented buffer.
      Its fields are not relevant for datapath, so here I take them out of the
      struct mlx5_frag_buf_ctrl, except for the fragments array itself.
      
      In addition, modified mlx5_fill_fbc to initialise the frags pointers
      as well. This implies that the buffer must be allocated before the
      function is called.
      
      A set of type-specific *_get_byte_size() functions are replaced by
      a generic one.
      Signed-off-by: 's avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: 's avatarSaeed Mahameed <saeedm@mellanox.com>
      4972e6fa
  15. 17 Oct, 2018 7 commits
  16. 16 Oct, 2018 12 commits