1. 26 Sep, 2020 40 commits
    • Oleksandr Natalenko's avatar
      ae55e2e4
    • Oleksandr Natalenko's avatar
      Merge tag 'v5.8.12' into stable-5.8 · ea7fd0c9
      Oleksandr Natalenko authored
      This is the 5.8.12 stable release
      ea7fd0c9
    • Greg Kroah-Hartman's avatar
    • Maor Dickman's avatar
      net/mlx5e: Fix endianness when calculating pedit mask first bit · 973f3e32
      Maor Dickman authored
      [ Upstream commit 82198d8b ]
      
      The field mask value is provided in network byte order and has to
      be converted to host byte order before calculating pedit mask
      first bit.
      
      Fixes: 88f30bbc ("net/mlx5e: Bit sized fields rewrite support")
      Signed-off-by: default avatarMaor Dickman <[email protected]>
      Reviewed-by: default avatarRoi Dayan <[email protected]>
      Signed-off-by: default avatarSaeed Mahameed <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      973f3e32
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Use synchronize_rcu to sync with NAPI · 1fa560ec
      Maxim Mikityanskiy authored
      [ Upstream commit 9c25a22d ]
      
      As described in the previous commit, napi_synchronize doesn't quite fit
      the purpose when we just need to wait until the currently running NAPI
      quits. Its implementation waits until NAPI is not running by polling and
      waiting for 1ms in between. In cases where we need to deactivate one
      queue (e.g., recovery flows) or where we deactivate them one-by-one
      (deactivate channel flow), we may get stuck in napi_synchronize forever
      if other queues keep NAPI active, causing a soft lockup. Depending on
      kernel configuration (CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC), it may result
      in a kernel panic.
      
      To fix the issue, use synchronize_rcu to wait for NAPI to quit, and wrap
      the whole NAPI in rcu_read_lock.
      
      Fixes: acc6c595 ("net/mlx5e: Split open/close channels to stages")
      Signed-off-by: default avatarMaxim Mikityanskiy <[email protected]>
      Reviewed-by: default avatarTariq Toukan <[email protected]>
      Signed-off-by: default avatarSaeed Mahameed <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      1fa560ec
    • Maxim Mikityanskiy's avatar
      net/mlx5e: Use RCU to protect rq->xdp_prog · cb02cd23
      Maxim Mikityanskiy authored
      [ Upstream commit fe45386a ]
      
      Currently, the RQs are temporarily deactivated while hot-replacing the
      XDP program, and napi_synchronize is used to make sure rq->xdp_prog is
      not in use. However, napi_synchronize is not ideal: instead of waiting
      till the end of a NAPI cycle, it polls and waits until NAPI is not
      running, sleeping for 1ms between the periodic checks. Under heavy
      workloads, this loop will never end, which may even lead to a kernel
      panic if the kernel detects the hangup. Such workloads include XSK TX
      and possibly also heavy RX (XSK or normal).
      
      The fix is inspired by commit 326fe02d ("net/mlx4_en: protect
      ring->xdp_prog with rcu_read_lock"). As mlx5e_xdp_handle is already
      protected by rcu_read_lock, and bpf_prog_put uses call_rcu to free the
      program, there is no need for additional synchronization if proper RCU
      functions are used to access the pointer. This patch converts all
      accesses to rq->xdp_prog to use RCU functions.
      
      Fixes: 86994156 ("net/mlx5e: XDP fast RX drop bpf programs support")
      Fixes: db05815b ("net/mlx5e: Add XSK zero-copy support")
      Signed-off-by: default avatarMaxim Mikityanskiy <[email protected]>
      Reviewed-by: default avatarTariq Toukan <[email protected]>
      Signed-off-by: default avatarSaeed Mahameed <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      cb02cd23
    • Taehee Yoo's avatar
      Revert "netns: don't disable BHs when locking "nsid_lock"" · bb677668
      Taehee Yoo authored
      [ Upstream commit e1f469cd ]
      
      This reverts commit 8d7e5dee.
      
      To protect netns id, the nsid_lock is used when netns id is being
      allocated and removed by peernet2id_alloc() and unhash_nsid().
      The nsid_lock can be used in BH context but only spin_lock() is used
      in this code.
      Using spin_lock() instead of spin_lock_bh() can result in a deadlock in
      the following scenario reported by the lockdep.
      In order to avoid a deadlock, the spin_lock_bh() should be used instead
      of spin_lock() to acquire nsid_lock.
      
      Test commands:
          ip netns del nst
          ip netns add nst
          ip link add veth1 type veth peer name veth2
          ip link set veth1 netns nst
          ip netns exec nst ip link add name br1 type bridge vlan_filtering 1
          ip netns exec nst ip link set dev br1 up
          ip netns exec nst ip link set dev veth1 master br1
          ip netns exec nst ip link set dev veth1 up
          ip netns exec nst ip link add macvlan0 link br1 up type macvlan
      
      Splat looks like:
      [   33.615860][  T607] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      [   33.617194][  T607] 5.9.0-rc1+ #665 Not tainted
      [ ... ]
      [   33.670615][  T607] Chain exists of:
      [   33.670615][  T607]   &mc->mca_lock --> &bridge_netdev_addr_lock_key --> &net->nsid_lock
      [   33.670615][  T607]
      [   33.673118][  T607]  Possible interrupt unsafe locking scenario:
      [   33.673118][  T607]
      [   33.674599][  T607]        CPU0                    CPU1
      [   33.675557][  T607]        ----                    ----
      [   33.676516][  T607]   lock(&net->nsid_lock);
      [   33.677306][  T607]                                local_irq_disable();
      [   33.678517][  T607]                                lock(&mc->mca_lock);
      [   33.679725][  T607]                                lock(&bridge_netdev_addr_lock_key);
      [   33.681166][  T607]   <Interrupt>
      [   33.681791][  T607]     lock(&mc->mca_lock);
      [   33.682579][  T607]
      [   33.682579][  T607]  *** DEADLOCK ***
      [ ... ]
      [   33.922046][  T607] stack backtrace:
      [   33.922999][  T607] CPU: 3 PID: 607 Comm: ip Not tainted 5.9.0-rc1+ #665
      [   33.924099][  T607] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [   33.925714][  T607] Call Trace:
      [   33.926238][  T607]  dump_stack+0x78/0xab
      [   33.926905][  T607]  check_irq_usage+0x70b/0x720
      [   33.927708][  T607]  ? iterate_chain_key+0x60/0x60
      [   33.928507][  T607]  ? check_path+0x22/0x40
      [   33.929201][  T607]  ? check_noncircular+0xcf/0x180
      [   33.930024][  T607]  ? __lock_acquire+0x1952/0x1f20
      [   33.930860][  T607]  __lock_acquire+0x1952/0x1f20
      [   33.931667][  T607]  lock_acquire+0xaf/0x3a0
      [   33.932366][  T607]  ? peernet2id_alloc+0x3a/0x170
      [   33.933147][  T607]  ? br_port_fill_attrs+0x54c/0x6b0 [bridge]
      [   33.934140][  T607]  ? br_port_fill_attrs+0x5de/0x6b0 [bridge]
      [   33.935113][  T607]  ? kvm_sched_clock_read+0x14/0x30
      [   33.935974][  T607]  _raw_spin_lock+0x30/0x70
      [   33.936728][  T607]  ? peernet2id_alloc+0x3a/0x170
      [   33.937523][  T607]  peernet2id_alloc+0x3a/0x170
      [   33.938313][  T607]  rtnl_fill_ifinfo+0xb5e/0x1400
      [   33.939091][  T607]  rtmsg_ifinfo_build_skb+0x8a/0xf0
      [   33.939953][  T607]  rtmsg_ifinfo_event.part.39+0x17/0x50
      [   33.940863][  T607]  rtmsg_ifinfo+0x1f/0x30
      [   33.941571][  T607]  __dev_notify_flags+0xa5/0xf0
      [   33.942376][  T607]  ? __irq_work_queue_local+0x49/0x50
      [   33.943249][  T607]  ? irq_work_queue+0x1d/0x30
      [   33.943993][  T607]  ? __dev_set_promiscuity+0x7b/0x1a0
      [   33.944878][  T607]  __dev_set_promiscuity+0x7b/0x1a0
      [   33.945758][  T607]  dev_set_promiscuity+0x1e/0x50
      [   33.946582][  T607]  br_port_set_promisc+0x1f/0x40 [bridge]
      [   33.947487][  T607]  br_manage_promisc+0x8b/0xe0 [bridge]
      [   33.948388][  T607]  __dev_set_promiscuity+0x123/0x1a0
      [   33.949244][  T607]  __dev_set_rx_mode+0x68/0x90
      [   33.950021][  T607]  dev_uc_add+0x50/0x60
      [   33.950720][  T607]  macvlan_open+0x18e/0x1f0 [macvlan]
      [   33.951601][  T607]  __dev_open+0xd6/0x170
      [   33.952269][  T607]  __dev_change_flags+0x181/0x1d0
      [   33.953056][  T607]  rtnl_configure_link+0x2f/0xa0
      [   33.953884][  T607]  __rtnl_newlink+0x6b9/0x8e0
      [   33.954665][  T607]  ? __lock_acquire+0x95d/0x1f20
      [   33.955450][  T607]  ? lock_acquire+0xaf/0x3a0
      [   33.956193][  T607]  ? is_bpf_text_address+0x5/0xe0
      [   33.956999][  T607]  rtnl_newlink+0x47/0x70
      Acked-by: default avatarGuillaume Nault <[email protected]>
      Fixes: 8d7e5dee ("netns: don't disable BHs when locking "nsid_lock"")
      Reported-by: [email protected]
      Signed-off-by: default avatarTaehee Yoo <[email protected]>
      Signed-off-by: default avatarJakub Kicinski <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      bb677668
    • Parshuram Thombare's avatar
      net: macb: fix for pause frame receive enable bit · e1338ca1
      Parshuram Thombare authored
      [ Upstream commit d7739b0b ]
      
      PAE bit of NCFGR register, when set, pauses transmission
      if a non-zero 802.3 classic pause frame is received.
      
      Fixes: 7897b071 ("net: macb: convert to phylink")
      Signed-off-by: default avatarParshuram Thombare <[email protected]>
      Signed-off-by: default avatarJakub Kicinski <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      e1338ca1
    • Matthias Schiffer's avatar
      net: dsa: microchip: ksz8795: really set the correct number of ports · d8c79a8b
      Matthias Schiffer authored
      [ Upstream commit fd944dc2 ]
      
      The KSZ9477 and KSZ8795 use the port_cnt field differently: For the
      KSZ9477, it includes the CPU port(s), while for the KSZ8795, it doesn't.
      
      It would be a good cleanup to make the handling of both drivers match,
      but as a first step, fix the recently broken assignment of num_ports in
      the KSZ8795 driver (which completely broke probing, as the CPU port
      index was always failing the num_ports check).
      
      Fixes: af199a1a ("net: dsa: microchip: set the correct number of ports")
      Signed-off-by: default avatarMatthias Schiffer <[email protected]>
      Reviewed-by: default avatarCodrin Ciubotariu <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      d8c79a8b
    • Vladimir Oltean's avatar
      net: dsa: link interfaces with the DSA master to get rid of lockdep warnings · 86ae8bde
      Vladimir Oltean authored
      [ Upstream commit 2f1e8ea7 ]
      
      Since commit 845e0ebb ("net: change addr_list_lock back to static
      key"), cascaded DSA setups (DSA switch port as DSA master for another
      DSA switch port) are emitting this lockdep warning:
      
      ============================================
      WARNING: possible recursive locking detected
      5.8.0-rc1-00133-g923e4b5032dd-dirty #208 Not tainted
      --------------------------------------------
      dhcpcd/323 is trying to acquire lock:
      ffff000066dd4268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
      
      but task is already holding lock:
      ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&dsa_master_addr_list_lock_key/1);
        lock(&dsa_master_addr_list_lock_key/1);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      3 locks held by dhcpcd/323:
       #0: ffffdbd1381dda18 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x24/0x30
       #1: ffff00006614b268 (_xmit_ETHER){+...}-{2:2}, at: dev_set_rx_mode+0x28/0x48
       #2: ffff00006608c268 (&dsa_master_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync+0x44/0x90
      
      stack backtrace:
      Call trace:
       dump_backtrace+0x0/0x1e0
       show_stack+0x20/0x30
       dump_stack+0xec/0x158
       __lock_acquire+0xca0/0x2398
       lock_acquire+0xe8/0x440
       _raw_spin_lock_nested+0x64/0x90
       dev_mc_sync+0x44/0x90
       dsa_slave_set_rx_mode+0x34/0x50
       __dev_set_rx_mode+0x60/0xa0
       dev_mc_sync+0x84/0x90
       dsa_slave_set_rx_mode+0x34/0x50
       __dev_set_rx_mode+0x60/0xa0
       dev_set_rx_mode+0x30/0x48
       __dev_open+0x10c/0x180
       __dev_change_flags+0x170/0x1c8
       dev_change_flags+0x2c/0x70
       devinet_ioctl+0x774/0x878
       inet_ioctl+0x348/0x3b0
       sock_do_ioctl+0x50/0x310
       sock_ioctl+0x1f8/0x580
       ksys_ioctl+0xb0/0xf0
       __arm64_sys_ioctl+0x28/0x38
       el0_svc_common.constprop.0+0x7c/0x180
       do_el0_svc+0x2c/0x98
       el0_sync_handler+0x9c/0x1b8
       el0_sync+0x158/0x180
      
      Since DSA never made use of the netdev API for describing links between
      upper devices and lower devices, the dev->lower_level value of a DSA
      switch interface would be 1, which would warn when it is a DSA master.
      
      We can use netdev_upper_dev_link() to describe the relationship between
      a DSA slave and a DSA master. To be precise, a DSA "slave" (switch port)
      is an "upper" to a DSA "master" (host port). The relationship is "many
      uppers to one lower", like in the case of VLAN. So, for that reason, we
      use the same function as VLAN uses.
      
      There might be a chance that somebody will try to take hold of this
      interface and use it immediately after register_netdev() and before
      netdev_upper_dev_link(). To avoid that, we do the registration and
      linkage while holding the RTNL, and we use the RTNL-locked cousin of
      register_netdev(), which is register_netdevice().
      
      Since this warning was not there when lockdep was using dynamic keys for
      addr_list_lock, we are blaming the lockdep patch itself. The network
      stack _has_ been using static lockdep keys before, and it _is_ likely
      that stacked DSA setups have been triggering these lockdep warnings
      since forever, however I can't test very old kernels on this particular
      stacked DSA setup, to ensure I'm not in fact introducing regressions.
      
      Fixes: 845e0ebb ("net: change addr_list_lock back to static key")
      Suggested-by: default avatarCong Wang <[email protected]>
      Signed-off-by: Vladimir Oltean's avatarVladimir Oltean <[email protected]>
      Reviewed-by: default avatarFlorian Fainelli <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      86ae8bde
    • Dexuan Cui's avatar
      hv_netvsc: Fix hibernation for mlx5 VF driver · 150feba3
      Dexuan Cui authored
      [ Upstream commit 19162fd4 ]
      
      mlx5_suspend()/resume() keep the network interface, so during hibernation
      netvsc_unregister_vf() and netvsc_register_vf() are not called, and hence
      netvsc_resume() should call netvsc_vf_changed() to switch the data path
      back to the VF after hibernation. Note: after we close and re-open the
      vmbus channel of the netvsc NIC in netvsc_suspend() and netvsc_resume(),
      the data path is implicitly switched to the netvsc NIC. Similarly,
      netvsc_suspend() should not call netvsc_unregister_vf(), otherwise the VF
      can no longer be used after hibernation.
      
      For mlx4, since the VF network interafce is explicitly destroyed and
      re-created during hibernation (see mlx4_suspend()/resume()), hv_netvsc
      already explicitly switches the data path from and to the VF automatically
      via netvsc_register_vf() and netvsc_unregister_vf(), so mlx4 doesn't need
      this fix. Note: mlx4 can still work with the fix because in
      netvsc_suspend()/resume() ndev_ctx->vf_netdev is NULL for mlx4.
      
      Fixes: 0efeea5f ("hv_netvsc: Add the support of hibernation")
      Signed-off-by: default avatarDexuan Cui <[email protected]>
      Signed-off-by: default avatarJakub Kicinski <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      150feba3
    • Luo bin's avatar
      hinic: fix rewaking txq after netif_tx_disable · 5b569fc5
      Luo bin authored
      [ Upstream commit a1b80e01 ]
      
      When calling hinic_close in hinic_set_channels, all queues are
      stopped after netif_tx_disable, but some queue may be rewaken in
      free_tx_poll by mistake while drv is handling tx irq. If one queue
      is rewaken core may call hinic_xmit_frame to send pkt after
      netif_tx_disable within a short time which may results in accessing
      memory that has been already freed in hinic_close. So we call
      napi_disable before netif_tx_disable in hinic_close to fix this bug.
      
      Fixes: 2eed5a8b ("hinic: add set_channels ethtool_ops support")
      Signed-off-by: default avatarLuo bin <[email protected]>
      Reviewed-by: default avatarJakub Kicinski <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      5b569fc5
    • Jianbo Liu's avatar
      net/mlx5e: Fix memory leak of tunnel info when rule under multipath not ready · 3a3aab5b
      Jianbo Liu authored
      [ Upstream commit 12a240a4 ]
      
      When deleting vxlan flow rule under multipath, tun_info in parse_attr is
      not freed when the rule is not ready.
      
      Fixes: ef06c9ee ("net/mlx5e: Allow one failure when offloading tc encap rules under multipath")
      Signed-off-by: default avatarJianbo Liu <[email protected]>
      Reviewed-by: default avatarRoi Dayan <[email protected]>
      Signed-off-by: default avatarSaeed Mahameed <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      3a3aab5b
    • Vadym Kochan's avatar
      net: ipa: fix u32_replace_bits by u32p_xxx version · e77ae86a
      Vadym Kochan authored
      [ Upstream commit c047dc1d ]
      
      Looks like u32p_replace_bits() should be used instead of
      u32_replace_bits() which does not modifies the value but returns the
      modified version.
      
      Fixes: 2b9feef2 ("soc: qcom: ipa: filter and routing tables")
      Signed-off-by: default avatarVadym Kochan <[email protected]>
      Reviewed-by: default avatarAlex Elder <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      e77ae86a
    • Jason Donenfeld's avatar
      wireguard: peerlookup: take lock before checking hash in replace operation · f41a9b68
      Jason Donenfeld authored
      [ Upstream commit 6147f7b1 ]
      
      Eric's suggested fix for the previous commit's mentioned race condition
      was to simply take the table->lock in wg_index_hashtable_replace(). The
      table->lock of the hash table is supposed to protect the bucket heads,
      not the entires, but actually, since all the mutator functions are
      already taking it, it makes sense to take it too for the test to
      hlist_unhashed, as a defense in depth measure, so that it no longer
      races with deletions, regardless of what other locks are protecting
      individual entries. This is sensible from a performance perspective
      because, as Eric pointed out, the case of being unhashed is already the
      unlikely case, so this won't add common contention. And comparing
      instructions, this basically doesn't make much of a difference other
      than pushing and popping %r13, used by the new `bool ret`. More
      generally, I like the idea of locking consistency across table mutator
      functions, and this might let me rest slightly easier at night.
      Suggested-by: default avatarEric Dumazet <[email protected]>
      Link: https://lore.kernel.org/wireguard/[email protected]/
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Signed-off-by: Jason Donenfeld's avatarJason A. Donenfeld <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      f41a9b68
    • Jason Donenfeld's avatar
      wireguard: noise: take lock when removing handshake entry from table · 1aa173b4
      Jason Donenfeld authored
      [ Upstream commit 9179ba31 ]
      
      Eric reported that syzkaller found a race of this variety:
      
      CPU 1                                       CPU 2
      -------------------------------------------|---------------------------------------
      wg_index_hashtable_replace(old, ...)       |
        if (hlist_unhashed(&old->index_hash))    |
                                                 | wg_index_hashtable_remove(old)
                                                 |   hlist_del_init_rcu(&old->index_hash)
      				           |     old->index_hash.pprev = NULL
        hlist_replace_rcu(&old->index_hash, ...) |
          *old->index_hash.pprev                 |
      
      Syzbot wasn't actually able to reproduce this more than once or create a
      reproducer, because the race window between checking "hlist_unhashed" and
      calling "hlist_replace_rcu" is just so small. Adding an mdelay(5) or
      similar there helps make this demonstrable using this simple script:
      
          #!/bin/bash
          set -ex
          trap 'kill $pid1; kill $pid2; ip link del wg0; ip link del wg1' EXIT
          ip link add wg0 type wireguard
          ip link add wg1 type wireguard
          wg set wg0 private-key <(wg genkey) listen-port 9999
          wg set wg1 private-key <(wg genkey) peer $(wg show wg0 public-key) endpoint 127.0.0.1:9999 persistent-keepalive 1
          wg set wg0 peer $(wg show wg1 public-key)
          ip link set wg0 up
          yes link set wg1 up | ip -force -batch - &
          pid1=$!
          yes link set wg1 down | ip -force -batch - &
          pid2=$!
          wait
      
      The fundumental underlying problem is that we permit calls to wg_index_
      hashtable_remove(handshake.entry) without requiring the caller to take
      the handshake mutex that is intended to protect members of handshake
      during mutations. This is consistently the case with calls to wg_index_
      hashtable_insert(handshake.entry) and wg_index_hashtable_replace(
      handshake.entry), but it's missing from a pertinent callsite of wg_
      index_hashtable_remove(handshake.entry). So, this patch makes sure that
      mutex is taken.
      
      The original code was a little bit funky though, in the form of:
      
          remove(handshake.entry)
          lock(), memzero(handshake.some_members), unlock()
          remove(handshake.entry)
      
      The original intention of that double removal pattern outside the lock
      appears to be some attempt to prevent insertions that might happen while
      locks are dropped during expensive crypto operations, but actually, all
      callers of wg_index_hashtable_insert(handshake.entry) take the write
      lock and then explicitly check handshake.state, as they should, which
      the aforementioned memzero clears, which means an insertion should
      already be impossible. And regardless, the original intention was
      necessarily racy, since it wasn't guaranteed that something else would
      run after the unlock() instead of after the remove(). So, from a
      soundness perspective, it seems positive to remove what looks like a
      hack at best.
      
      The crash from both syzbot and from the script above is as follows:
      
        general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
        KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
        CPU: 0 PID: 7395 Comm: kworker/0:3 Not tainted 5.9.0-rc4-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Workqueue: wg-kex-wg1 wg_packet_handshake_receive_worker
        RIP: 0010:hlist_replace_rcu include/linux/rculist.h:505 [inline]
        RIP: 0010:wg_index_hashtable_replace+0x176/0x330 drivers/net/wireguard/peerlookup.c:174
        Code: 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 44 01 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 10 48 89 c6 48 c1 ee 03 <80> 3c 0e 00 0f 85 06 01 00 00 48 85 d2 4c 89 28 74 47 e8 a3 4f b5
        RSP: 0018:ffffc90006a97bf8 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff888050ffc4f8 RCX: dffffc0000000000
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88808e04e010
        RBP: ffff88808e04e000 R08: 0000000000000001 R09: ffff8880543d0000
        R10: ffffed100a87a000 R11: 000000000000016e R12: ffff8880543d0000
        R13: ffff88808e04e008 R14: ffff888050ffc508 R15: ffff888050ffc500
        FS:  0000000000000000(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000f5505db0 CR3: 0000000097cf7000 CR4: 00000000001526f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
        wg_noise_handshake_begin_session+0x752/0xc9a drivers/net/wireguard/noise.c:820
        wg_receive_handshake_packet drivers/net/wireguard/receive.c:183 [inline]
        wg_packet_handshake_receive_worker+0x33b/0x730 drivers/net/wireguard/receive.c:220
        process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
        worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
        kthread+0x3b5/0x4a0 kernel/kthread.c:292
        ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      Reported-by: default avatarsyzbot <[email protected]>
      Reported-by: default avatarEric Dumazet <[email protected]>
      Link: https://lore.kernel.org/wireguard/[email protected]/
      Fixes: e7096c13 ("net: WireGuard secure network tunnel")
      Signed-off-by: Jason Donenfeld's avatarJason A. Donenfeld <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      1aa173b4
    • Grygorii Strashko's avatar
      net: ethernet: ti: cpsw_new: fix suspend/resume · 9346c248
      Grygorii Strashko authored
      [ Upstream commit 5760d9ac ]
      
      Add missed suspend/resume callbacks to properly restore networking after
      suspend/resume cycle.
      
      Fixes: ed3525ed ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac")
      Signed-off-by: default avatarGrygorii Strashko <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      9346c248
    • Eric Dumazet's avatar
      net: add __must_check to skb_put_padto() · b42148e2
      Eric Dumazet authored
      [ Upstream commit 4a009cb0 ]
      
      skb_put_padto() and __skb_put_padto() callers
      must check return values or risk use-after-free.
      Signed-off-by: default avatarEric Dumazet <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      b42148e2
    • Eric Dumazet's avatar
      net: qrtr: check skb_put_padto() return value · 9cb51c5f
      Eric Dumazet authored
      [ Upstream commit 3ca1a42a ]
      
      If skb_put_padto() returns an error, skb has been freed.
      Better not touch it anymore, as reported by syzbot [1]
      
      Note to qrtr maintainers : this suggests qrtr_sendmsg()
      should adjust sock_alloc_send_skb() second parameter
      to account for the potential added alignment to avoid
      reallocation.
      
      [1]
      
      BUG: KASAN: use-after-free in __skb_insert include/linux/skbuff.h:1907 [inline]
      BUG: KASAN: use-after-free in __skb_queue_before include/linux/skbuff.h:2016 [inline]
      BUG: KASAN: use-after-free in __skb_queue_tail include/linux/skbuff.h:2049 [inline]
      BUG: KASAN: use-after-free in skb_queue_tail+0x6b/0x120 net/core/skbuff.c:3146
      Write of size 8 at addr ffff88804d8ab3c0 by task syz-executor.4/4316
      
      CPU: 1 PID: 4316 Comm: syz-executor.4 Not tainted 5.9.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1d6/0x29e lib/dump_stack.c:118
       print_address_description+0x66/0x620 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report+0x132/0x1d0 mm/kasan/report.c:530
       __skb_insert include/linux/skbuff.h:1907 [inline]
       __skb_queue_before include/linux/skbuff.h:2016 [inline]
       __skb_queue_tail include/linux/skbuff.h:2049 [inline]
       skb_queue_tail+0x6b/0x120 net/core/skbuff.c:3146
       qrtr_tun_send+0x1a/0x40 net/qrtr/tun.c:23
       qrtr_node_enqueue+0x44f/0xc00 net/qrtr/qrtr.c:364
       qrtr_bcast_enqueue+0xbe/0x140 net/qrtr/qrtr.c:861
       qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg net/socket.c:671 [inline]
       sock_write_iter+0x317/0x470 net/socket.c:998
       call_write_iter include/linux/fs.h:1882 [inline]
       new_sync_write fs/read_write.c:503 [inline]
       vfs_write+0xa96/0xd10 fs/read_write.c:578
       ksys_write+0x11b/0x220 fs/read_write.c:631
       do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45d5b9
      Code: 5d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f84b5b81c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000038b40 RCX: 000000000045d5b9
      RDX: 0000000000000055 RSI: 0000000020001240 RDI: 0000000000000003
      RBP: 00007f84b5b81ca0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000f
      R13: 00007ffcbbf86daf R14: 00007f84b5b829c0 R15: 000000000118cf4c
      
      Allocated by task 4316:
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc+0x100/0x130 mm/kasan/common.c:461
       slab_post_alloc_hook+0x3e/0x290 mm/slab.h:518
       slab_alloc mm/slab.c:3312 [inline]
       kmem_cache_alloc+0x1c1/0x2d0 mm/slab.c:3482
       skb_clone+0x1b2/0x370 net/core/skbuff.c:1449
       qrtr_bcast_enqueue+0x6d/0x140 net/qrtr/qrtr.c:857
       qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg net/socket.c:671 [inline]
       sock_write_iter+0x317/0x470 net/socket.c:998
       call_write_iter include/linux/fs.h:1882 [inline]
       new_sync_write fs/read_write.c:503 [inline]
       vfs_write+0xa96/0xd10 fs/read_write.c:578
       ksys_write+0x11b/0x220 fs/read_write.c:631
       do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 4316:
       kasan_save_stack mm/kasan/common.c:48 [inline]
       kasan_set_track+0x3d/0x70 mm/kasan/common.c:56
       kasan_set_free_info+0x17/0x30 mm/kasan/generic.c:355
       __kasan_slab_free+0xdd/0x110 mm/kasan/common.c:422
       __cache_free mm/slab.c:3418 [inline]
       kmem_cache_free+0x82/0xf0 mm/slab.c:3693
       __skb_pad+0x3f5/0x5a0 net/core/skbuff.c:1823
       __skb_put_padto include/linux/skbuff.h:3233 [inline]
       skb_put_padto include/linux/skbuff.h:3252 [inline]
       qrtr_node_enqueue+0x62f/0xc00 net/qrtr/qrtr.c:360
       qrtr_bcast_enqueue+0xbe/0x140 net/qrtr/qrtr.c:861
       qrtr_sendmsg+0x680/0x9c0 net/qrtr/qrtr.c:960
       sock_sendmsg_nosec net/socket.c:651 [inline]
       sock_sendmsg net/socket.c:671 [inline]
       sock_write_iter+0x317/0x470 net/socket.c:998
       call_write_iter include/linux/fs.h:1882 [inline]
       new_sync_write fs/read_write.c:503 [inline]
       vfs_write+0xa96/0xd10 fs/read_write.c:578
       ksys_write+0x11b/0x220 fs/read_write.c:631
       do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The buggy address belongs to the object at ffff88804d8ab3c0
       which belongs to the cache skbuff_head_cache of size 224
      The buggy address is located 0 bytes inside of
       224-byte region [ffff88804d8ab3c0, ffff88804d8ab4a0)
      The buggy address belongs to the page:
      page:00000000ea8cccfb refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88804d8abb40 pfn:0x4d8ab
      flags: 0xfffe0000000200(slab)
      raw: 00fffe0000000200 ffffea0002237ec8 ffffea00029b3388 ffff88821bb66800
      raw: ffff88804d8abb40 ffff88804d8ab000 000000010000000b 0000000000000000
      page dumped because: kasan: bad access detected
      
      Fixes: ce57785b ("net: qrtr: fix len of skb_put_padto in qrtr_node_enqueue")
      Signed-off-by: default avatarEric Dumazet <[email protected]>
      Reported-by: default avatarsyzbot <[email protected]>
      Cc: Carl Huang <[email protected]>
      Cc: Wen Gong <[email protected]>
      Cc: Bjorn Andersson <[email protected]>
      Cc: Manivannan Sadhasivam <[email protected]>
      Acked-by: default avatarManivannan Sadhasivam <[email protected]>
      Reviewed-by: Bjorn Andersson's avatarBjorn Andersson <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      9cb51c5f
    • Florian Fainelli's avatar
      net: phy: Do not warn in phy_stop() on PHY_DOWN · 410c186d
      Florian Fainelli authored
      [ Upstream commit 5116a8ad ]
      
      When phy_is_started() was added to catch incorrect PHY states,
      phy_stop() would not be qualified against PHY_DOWN. It is possible to
      reach that state when the PHY driver has been unbound and the network
      device is then brought down.
      
      Fixes: 2b3e88ea ("net: phy: improve phy state checking")
      Signed-off-by: default avatarFlorian Fainelli <[email protected]>
      Reviewed-by: default avatarAndrew Lunn <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      410c186d
    • Florian Fainelli's avatar
      net: phy: Avoid NPD upon phy_detach() when driver is unbound · 3fa1f846
      Florian Fainelli authored
      [ Upstream commit c2b727df ]
      
      If we have unbound the PHY driver prior to calling phy_detach() (often
      via phy_disconnect()) then we can cause a NULL pointer de-reference
      accessing the driver owner member. The steps to reproduce are:
      
      echo unimac-mdio-0:01 > /sys/class/net/eth0/phydev/driver/unbind
      ip link set eth0 down
      
      Fixes: cafe8df8 ("net: phy: Fix lack of reference count on PHY driver")
      Signed-off-by: default avatarFlorian Fainelli <[email protected]>
      Reviewed-by: default avatarAndrew Lunn <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      3fa1f846
    • Hauke Mehrtens's avatar
      net: lantiq: Disable IRQs only if NAPI gets scheduled · 318d2408
      Hauke Mehrtens authored
      [ Upstream commit 9423361d ]
      
      The napi_schedule() call will only schedule the NAPI if it is not
      already running. To make sure that we do not deactivate interrupts
      without scheduling NAPI only deactivate the interrupts in case NAPI also
      gets scheduled.
      Signed-off-by: Hauke Mehrtens's avatarHauke Mehrtens <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      318d2408
    • Hauke Mehrtens's avatar
      net: lantiq: Use napi_complete_done() · c36c0630
      Hauke Mehrtens authored
      [ Upstream commit c582a7fe ]
      
      Use napi_complete_done() and activate the interrupts when this function
      returns true. This way the generic NAPI code can take care of activating
      the interrupts.
      Signed-off-by: Hauke Mehrtens's avatarHauke Mehrtens <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      c36c0630
    • Hauke Mehrtens's avatar
      net: lantiq: use netif_tx_napi_add() for TX NAPI · 5e0c1c65
      Hauke Mehrtens authored
      [ Upstream commit 74c7b80e ]
      
      netif_tx_napi_add() should be used for NAPI in the TX direction instead
      of the netif_napi_add() function.
      Signed-off-by: Hauke Mehrtens's avatarHauke Mehrtens <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      5e0c1c65
    • Hauke Mehrtens's avatar
      net: lantiq: Wake TX queue again · 345fa127
      Hauke Mehrtens authored
      [ Upstream commit dea36631 ]
      
      The call to netif_wake_queue() when the TX descriptors were freed was
      missing. When there are no TX buffers available the TX queue will be
      stopped, but it was not started again when they are available again,
      this is fixed in this patch.
      
      Fixes: fe1a5642 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
      Signed-off-by: Hauke Mehrtens's avatarHauke Mehrtens <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      345fa127
    • Michael Chan's avatar
      bnxt_en: Protect bnxt_set_eee() and bnxt_set_pauseparam() with mutex. · 786c627c
      Michael Chan authored
      [ Upstream commit a5390690 ]
      
      All changes related to bp->link_info require the protection of the
      link_lock mutex.  It's not sufficient to rely just on RTNL.
      
      Fixes: 163e9ef6 ("bnxt_en: Fix race when modifying pause settings.")
      Reviewed-by: default avatarEdwin Peer <[email protected]>
      Signed-off-by: default avatarMichael Chan <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      786c627c
    • Edwin Peer's avatar
      bnxt_en: return proper error codes in bnxt_show_temp · 138ded90
      Edwin Peer authored
      [ Upstream commit d69753fa ]
      
      Returning "unknown" as a temperature value violates the hwmon interface
      rules. Appropriate error codes should be returned via device_attribute
      show instead. These will ultimately be propagated to the user via the
      file system interface.
      
      In addition to the corrected error handling, it is an even better idea to
      not present the sensor in sysfs at all if it is known that the read will
      definitely fail. Given that temp1_input is currently the only sensor
      reported, ensure no hwmon registration if TEMP_MONITOR_QUERY is not
      supported or if it will fail due to access permissions. Something smarter
      may be needed if and when other sensors are added.
      
      Fixes: 12cce90b ("bnxt_en: fix HWRM error when querying VF temperature")
      Signed-off-by: default avatarEdwin Peer <[email protected]>
      Signed-off-by: default avatarMichael Chan <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      138ded90
    • Vasundhara Volam's avatar
      bnxt_en: Use memcpy to copy VPD field info. · 693ba8c8
      Vasundhara Volam authored
      [ Upstream commit 492adcf4 ]
      
      Using strlcpy() to copy from VPD is not correct because VPD strings
      are not necessarily NULL terminated.  Use memcpy() to copy the VPD
      length up to the destination buffer size - 1.  The destination is
      zeroed memory so it will always be NULL terminated.
      
      Fixes: a0d0fd70 ("bnxt_en: Read partno and serialno of the board from VPD")
      Signed-off-by: default avatarVasundhara Volam <[email protected]>
      Signed-off-by: default avatarMichael Chan <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      693ba8c8
    • Tariq Toukan's avatar
      net/mlx5e: TLS, Do not expose FPGA TLS counter if not supported · 6229623f
      Tariq Toukan authored
      [ Upstream commit 8f0bcd19 ]
      
      The set of TLS TX global SW counters in mlx5e_tls_sw_stats_desc
      is updated from all rings by using atomic ops.
      This set of stats is used only in the FPGA TLS use case, not in
      the Connect-X TLS one, where regular per-ring counters are used.
      
      Do not expose them in the Connect-X use case, as this would cause
      counter duplication. For example, tx_tls_drop_no_sync_data would
      appear twice in the ethtool stats.
      
      Fixes: d2ead1f3 ("net/mlx5e: Add kTLS TX HW offload support")
      Signed-off-by: default avatarTariq Toukan <[email protected]>
      Reviewed-by: default avatarMoshe Shemesh <[email protected]>
      Signed-off-by: default avatarSaeed Mahameed <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      6229623f
    • Maor Dickman's avatar
      net/mlx5e: Enable adding peer miss rules only if merged eswitch is supported · 46ef5581
      Maor Dickman authored
      [ Upstream commit 6cec0229 ]
      
      The cited commit creates peer miss group during switchdev mode
      initialization in order to handle miss packets correctly while in VF
      LAG mode. This is done regardless of FW support of such groups which
      could cause rules setups failure later on.
      
      Fix by adding FW capability check before creating peer groups/rule.
      
      Fixes: ac004b83 ("net/mlx5e: E-Switch, Add peer miss rules")
      Signed-off-by: default avatarMaor Dickman <[email protected]>
      Reviewed-by: default avatarRoi Dayan <[email protected]>
      Reviewed-by: default avatarRaed Salem <[email protected]>
      Signed-off-by: default avatarSaeed Mahameed <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      46ef5581
    • Xin Long's avatar
      tipc: use skb_unshare() instead in tipc_buf_append() · 41f2b0bd
      Xin Long authored
      [ Upstream commit ff48b622 ]
      
      In tipc_buf_append() it may change skb's frag_list, and it causes
      problems when this skb is cloned. skb_unclone() doesn't really
      make this skb's flag_list available to change.
      
      Shuang Li has reported an use-after-free issue because of this
      when creating quite a few macvlan dev over the same dev, where
      the broadcast packets will be cloned and go up to the stack:
      
       [ ] BUG: KASAN: use-after-free in pskb_expand_head+0x86d/0xea0
       [ ] Call Trace:
       [ ]  dump_stack+0x7c/0xb0
       [ ]  print_address_description.constprop.7+0x1a/0x220
       [ ]  kasan_report.cold.10+0x37/0x7c
       [ ]  check_memory_region+0x183/0x1e0
       [ ]  pskb_expand_head+0x86d/0xea0
       [ ]  process_backlog+0x1df/0x660
       [ ]  net_rx_action+0x3b4/0xc90
       [ ]
       [ ] Allocated by task 1786:
       [ ]  kmem_cache_alloc+0xbf/0x220
       [ ]  skb_clone+0x10a/0x300
       [ ]  macvlan_broadcast+0x2f6/0x590 [macvlan]
       [ ]  macvlan_process_broadcast+0x37c/0x516 [macvlan]
       [ ]  process_one_work+0x66a/0x1060
       [ ]  worker_thread+0x87/0xb10
       [ ]
       [ ] Freed by task 3253:
       [ ]  kmem_cache_free+0x82/0x2a0
       [ ]  skb_release_data+0x2c3/0x6e0
       [ ]  kfree_skb+0x78/0x1d0
       [ ]  tipc_recvmsg+0x3be/0xa40 [tipc]
      
      So fix it by using skb_unshare() instead, which would create a new
      skb for the cloned frag and it'll be safe to change its frag_list.
      The similar things were also done in sctp_make_reassembled_event(),
      which is using skb_copy().
      Reported-by: default avatarShuang Li <[email protected]>
      Fixes: 37e22164 ("tipc: rename and move message reassembly function")
      Signed-off-by: default avatarXin Long <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      41f2b0bd
    • Tetsuo Handa's avatar
      tipc: fix shutdown() of connection oriented socket · d8e7be19
      Tetsuo Handa authored
      [ Upstream commit a4b5cc9e ]
      
      I confirmed that the problem fixed by commit 2a63866c ("tipc: fix
      shutdown() of connectionless socket") also applies to stream socket.
      
      ----------
      #include <sys/socket.h>
      #include <unistd.h>
      #include <sys/wait.h>
      
      int main(int argc, char *argv[])
      {
              int fds[2] = { -1, -1 };
              socketpair(PF_TIPC, SOCK_STREAM /* or SOCK_DGRAM */, 0, fds);
              if (fork() == 0)
                      _exit(read(fds[0], NULL, 1));
              shutdown(fds[0], SHUT_RDWR); /* This must make read() return. */
              wait(NULL); /* To be woken up by _exit(). */
              return 0;
      }
      ----------
      
      Since shutdown(SHUT_RDWR) should affect all processes sharing that socket,
      unconditionally setting sk->sk_shutdown to SHUTDOWN_MASK will be the right
      behavior.
      Signed-off-by: default avatarTetsuo Handa <[email protected]>
      Acked-by: Ying Xue's avatarYing Xue <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      d8e7be19
    • Peilin Ye's avatar
      tipc: Fix memory leak in tipc_group_create_member() · 7ada162e
      Peilin Ye authored
      [ Upstream commit bb3a420d ]
      
      tipc_group_add_to_tree() returns silently if `key` matches `nkey` of an
      existing node, causing tipc_group_create_member() to leak memory. Let
      tipc_group_add_to_tree() return an error in such a case, so that
      tipc_group_create_member() can handle it properly.
      
      Fixes: 75da2163 ("tipc: introduce communication groups")
      Reported-and-tested-by: [email protected]
      Cc: Hillf Danton <[email protected]>
      Link: https://syzkaller.appspot.com/bug?id=048390604fe1b60df34150265479202f10e13affSigned-off-by: default avatarPeilin Ye <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      7ada162e
    • Vinicius Costa Gomes's avatar
      taprio: Fix allowing too small intervals · 817ff507
      Vinicius Costa Gomes authored
      [ Upstream commit b5b73b26 ]
      
      It's possible that the user specifies an interval that couldn't allow
      any packet to be transmitted. This also avoids the issue of the
      hrtimer handler starving the other threads because it's running too
      often.
      
      The solution is to reject interval sizes that according to the current
      link speed wouldn't allow any packet to be transmitted.
      
      Reported-by: [email protected]
      Fixes: 5a781ccb ("tc: Add support for configuring the taprio scheduler")
      Signed-off-by: default avatarVinicius Costa Gomes <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      817ff507
    • Jakub Kicinski's avatar
      nfp: use correct define to return NONE fec · b193ee2a
      Jakub Kicinski authored
      [ Upstream commit 5f6857e8 ]
      
      struct ethtool_fecparam carries bitmasks not bit numbers.
      We want to return 1 (NONE), not 0.
      
      Fixes: 0d087093 ("nfp: implement ethtool FEC mode settings")
      Signed-off-by: default avatarJakub Kicinski <[email protected]>
      Reviewed-by: default avatarSimon Horman <[email protected]>
      Reviewed-by: Jesse Brandeburg's avatarJesse Brandeburg <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      b193ee2a
    • Henry Ptasinski's avatar
      net: sctp: Fix IPv6 ancestor_size calc in sctp_copy_descendant · 5bda4b3e
      Henry Ptasinski authored
      [ Upstream commit fe81d9f6 ]
      
      When calculating ancestor_size with IPv6 enabled, simply using
      sizeof(struct ipv6_pinfo) doesn't account for extra bytes needed for
      alignment in the struct sctp6_sock. On x86, there aren't any extra
      bytes, but on ARM the ipv6_pinfo structure is aligned on an 8-byte
      boundary so there were 4 pad bytes that were omitted from the
      ancestor_size calculation.  This would lead to corruption of the
      pd_lobby pointers, causing an oops when trying to free the sctp
      structure on socket close.
      
      Fixes: 636d25d5 ("sctp: not copy sctp_sock pd_lobby in sctp_copy_descendant")
      Signed-off-by: default avatarHenry Ptasinski <[email protected]>
      Acked-by: Marcelo Ricardo Leitner's avatarMarcelo Ricardo Leitner <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      5bda4b3e
    • Yunsheng Lin's avatar
      net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc · bd6df026
      Yunsheng Lin authored
      [ Upstream commit 2fb541c8 ]
      
      Currently there is concurrent reset and enqueue operation for the
      same lockless qdisc when there is no lock to synchronize the
      q->enqueue() in __dev_xmit_skb() with the qdisc reset operation in
      qdisc_deactivate() called by dev_deactivate_queue(), which may cause
      out-of-bounds access for priv->ring[] in hns3 driver if user has
      requested a smaller queue num when __dev_xmit_skb() still enqueue a
      skb with a larger queue_mapping after the corresponding qdisc is
      reset, and call hns3_nic_net_xmit() with that skb later.
      
      Reused the existing synchronize_net() in dev_deactivate_many() to
      make sure skb with larger queue_mapping enqueued to old qdisc(which
      is saved in dev_queue->qdisc_sleeping) will always be reset when
      dev_reset_queue() is called.
      
      Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
      Signed-off-by: default avatarYunsheng Lin <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      bd6df026
    • Xin Long's avatar
      net: sched: initialize with 0 before setting erspan md->u · abdd4687
      Xin Long authored
      [ Upstream commit 8e1b3ac4 ]
      
      In fl_set_erspan_opt(), all bits of erspan md was set 1, as this
      function is also used to set opt MASK. However, when setting for
      md->u.index for opt VALUE, the rest bits of the union md->u will
      be left 1. It would cause to fail the match of the whole md when
      version is 1 and only index is set.
      
      This patch is to fix by initializing with 0 before setting erspan
      md->u.
      Reported-by: default avatarShuang Li <[email protected]>
      Fixes: 79b1011c ("net: sched: allow flower to match erspan options")
      Signed-off-by: default avatarXin Long <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      abdd4687
    • Yoshihiro Shimoda's avatar
      net: phy: call phy_disable_interrupts() in phy_attach_direct() instead · 92862332
      Yoshihiro Shimoda authored
      [ Upstream commit 7d3ba936 ]
      
      Since the micrel phy driver calls phy_init_hw() as a workaround,
      the commit 9886a4db ("net: phy: call phy_disable_interrupts()
      in phy_init_hw()") disables the interrupt unexpectedly. So,
      call phy_disable_interrupts() in phy_attach_direct() instead.
      Otherwise, the phy cannot link up after the ethernet cable was
      disconnected.
      
      Note that other drivers (like at803x.c) also calls phy_init_hw().
      So, perhaps, the driver caused a similar issue too.
      
      Fixes: 9886a4db ("net: phy: call phy_disable_interrupts() in phy_init_hw()")
      Signed-off-by: Yoshihiro Shimoda's avatarYoshihiro Shimoda <[email protected]>
      Signed-off-by: default avatarDavid S. Miller <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
      92862332
    • Maor Gottlieb's avatar
      net/mlx5: Fix FTE cleanup · e9fd16fc
      Maor Gottlieb authored
      [ Upstream commit cefc2355 ]
      
      Currently, when an FTE is allocated, its refcount is decreased to 0
      with the purpose it will not be a stand alone steering object and every
      rule (destination) of the FTE would increase the refcount.
      When mlx5_cleanup_fs is called while not all rules were deleted by the
      steering users, it hit refcount underflow on the FTE once clean_tree
      calls to tree_remove_node after the deleted rules already decreased
      the refcount to 0.
      
      FTE is no longer destroyed implicitly when the last rule (destination)
      is deleted. mlx5_del_flow_rules avoids it by increasing the refcount on
      the FTE and destroy it explicitly after all rules were deleted. So we
      can avoid the refcount underflow by making FTE as stand alone object.
      In addition need to set del_hw_func to FTE so the HW object will be
      destroyed when the FTE is deleted from the cleanup_tree flow.
      
      refcount_t: underflow; use-after-free.
      WARNING: CPU: 2 PID: 15715 at lib/refcount.c:28 refcount_warn_saturate+0xd9/0xe0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Call Trace:
       tree_put_node+0xf2/0x140 [mlx5_core]
       clean_tree+0x4e/0xf0 [mlx5_core]
       clean_tree+0x4e/0xf0 [mlx5_core]
       clean_tree+0x4e/0xf0 [mlx5_core]
       clean_tree+0x5f/0xf0 [mlx5_core]
       clean_tree+0x4e/0xf0 [mlx5_core]
       clean_tree+0x5f/0xf0 [mlx5_core]
       mlx5_cleanup_fs+0x26/0x270 [mlx5_core]
       mlx5_unload+0x2e/0xa0 [mlx5_core]
       mlx5_unload_one+0x51/0x120 [mlx5_core]
       mlx5_devlink_reload_down+0x51/0x90 [mlx5_core]
       devlink_reload+0x39/0x120
       ? devlink_nl_cmd_reload+0x43/0x220
       genl_rcv_msg+0x1e4/0x420
       ? genl_family_rcv_msg_attrs_parse+0x100/0x100
       netlink_rcv_skb+0x47/0x110
       genl_rcv+0x24/0x40
       netlink_unicast+0x217/0x2f0
       netlink_sendmsg+0x30f/0x430
       sock_sendmsg+0x30/0x40
       __sys_sendto+0x10e/0x140
       ? handle_mm_fault+0xc4/0x1f0
       ? do_page_fault+0x33f/0x630
       __x64_sys_sendto+0x24/0x30
       do_syscall_64+0x48/0x130
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 718ce4d6 ("net/mlx5: Consolidate update FTE for all removal changes")
      Fixes: bd71b08e ("net/mlx5: Support multiple updates of steering rules in parallel")
      Signed-off-by: default avatarMaor Gottlieb <[email protected]>
      Reviewed-by: default avatarMark Bloch <[email protected]>
      Signed-off-by: default avatarSaeed Mahameed <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <gre[email protected]>
      e9fd16fc