1. 05 Apr, 2019 5 commits
    • Florian Westphal's avatar
      netfilter: physdev: relax br_netfilter dependency · ebd0f306
      Florian Westphal authored
      [ Upstream commit 8e2f311a ]
      
      Following command:
        iptables -D FORWARD -m physdev ...
      causes connectivity loss in some setups.
      
      Reason is that iptables userspace will probe kernel for the module revision
      of the physdev patch, and physdev has an artificial dependency on
      br_netfilter (xt_physdev use makes no sense unless a br_netfilter module
      is loaded).
      
      This causes the "phydev" module to be loaded, which in turn enables the
      "call-iptables" infrastructure.
      
      bridged packets might then get dropped by the iptables ruleset.
      
      The better fix would be to change the "call-iptables" defaults to 0 and
      enforce explicit setting to 1, but that breaks backwards compatibility.
      
      This does the next best thing: add a request_module call to checkentry.
      This was a stray '-D ... -m physdev' won't activate br_netfilter
      anymore.
      Signed-off-by: 's avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: 's avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      ebd0f306
    • Chieh-Min Wang's avatar
      netfilter: conntrack: fix cloned unconfirmed skb->_nfct race in __nf_conntrack_confirm · f74b0a4b
      Chieh-Min Wang authored
      [ Upstream commit 13f5251f ]
      
      For bridge(br_flood) or broadcast/multicast packets, they could clone
      skb with unconfirmed conntrack which break the rule that unconfirmed
      skb->_nfct is never shared.  With nfqueue running on my system, the race
      can be easily reproduced with following warning calltrace:
      
      [13257.707525] CPU: 0 PID: 12132 Comm: main Tainted: P        W       4.4.60 #7744
      [13257.707568] Hardware name: Qualcomm (Flattened Device Tree)
      [13257.714700] [<c021f6dc>] (unwind_backtrace) from [<c021bce8>] (show_stack+0x10/0x14)
      [13257.720253] [<c021bce8>] (show_stack) from [<c0449e10>] (dump_stack+0x94/0xa8)
      [13257.728240] [<c0449e10>] (dump_stack) from [<c022a7e0>] (warn_slowpath_common+0x94/0xb0)
      [13257.735268] [<c022a7e0>] (warn_slowpath_common) from [<c022a898>] (warn_slowpath_null+0x1c/0x24)
      [13257.743519] [<c022a898>] (warn_slowpath_null) from [<c06ee450>] (__nf_conntrack_confirm+0xa8/0x618)
      [13257.752284] [<c06ee450>] (__nf_conntrack_confirm) from [<c0772670>] (ipv4_confirm+0xb8/0xfc)
      [13257.761049] [<c0772670>] (ipv4_confirm) from [<c06e7a60>] (nf_iterate+0x48/0xa8)
      [13257.769725] [<c06e7a60>] (nf_iterate) from [<c06e7af0>] (nf_hook_slow+0x30/0xb0)
      [13257.777108] [<c06e7af0>] (nf_hook_slow) from [<c07f20b4>] (br_nf_post_routing+0x274/0x31c)
      [13257.784486] [<c07f20b4>] (br_nf_post_routing) from [<c06e7a60>] (nf_iterate+0x48/0xa8)
      [13257.792556] [<c06e7a60>] (nf_iterate) from [<c06e7af0>] (nf_hook_slow+0x30/0xb0)
      [13257.800458] [<c06e7af0>] (nf_hook_slow) from [<c07e5580>] (br_forward_finish+0x94/0xa4)
      [13257.808010] [<c07e5580>] (br_forward_finish) from [<c07f22ac>] (br_nf_forward_finish+0x150/0x1ac)
      [13257.815736] [<c07f22ac>] (br_nf_forward_finish) from [<c06e8df0>] (nf_reinject+0x108/0x170)
      [13257.824762] [<c06e8df0>] (nf_reinject) from [<c06ea854>] (nfqnl_recv_verdict+0x3d8/0x420)
      [13257.832924] [<c06ea854>] (nfqnl_recv_verdict) from [<c06e940c>] (nfnetlink_rcv_msg+0x158/0x248)
      [13257.841256] [<c06e940c>] (nfnetlink_rcv_msg) from [<c06e5564>] (netlink_rcv_skb+0x54/0xb0)
      [13257.849762] [<c06e5564>] (netlink_rcv_skb) from [<c06e4ec8>] (netlink_unicast+0x148/0x23c)
      [13257.858093] [<c06e4ec8>] (netlink_unicast) from [<c06e5364>] (netlink_sendmsg+0x2ec/0x368)
      [13257.866348] [<c06e5364>] (netlink_sendmsg) from [<c069fb8c>] (sock_sendmsg+0x34/0x44)
      [13257.874590] [<c069fb8c>] (sock_sendmsg) from [<c06a03dc>] (___sys_sendmsg+0x1ec/0x200)
      [13257.882489] [<c06a03dc>] (___sys_sendmsg) from [<c06a11c8>] (__sys_sendmsg+0x3c/0x64)
      [13257.890300] [<c06a11c8>] (__sys_sendmsg) from [<c0209b40>] (ret_fast_syscall+0x0/0x34)
      
      The original code just triggered the warning but do nothing. It will
      caused the shared conntrack moves to the dying list and the packet be
      droppped (nf_ct_resolve_clash returns NF_DROP for dying conntrack).
      
      - Reproduce steps:
      
      +----------------------------+
      |          br0(bridge)       |
      |                            |
      +-+---------+---------+------+
        | eth0|   | eth1|   | eth2|
        |     |   |     |   |     |
        +--+--+   +--+--+   +---+-+
           |         |          |
           |         |          |
        +--+-+     +-+--+    +--+-+
        | PC1|     | PC2|    | PC3|
        +----+     +----+    +----+
      
      iptables -A FORWARD -m mark --mark 0x1000000/0x1000000 -j NFQUEUE --queue-num 100 --queue-bypass
      
      ps: Our nfq userspace program will set mark on packets whose connection
      has already been processed.
      
      PC1 sends broadcast packets simulated by hping3:
      
      hping3 --rand-source --udp 192.168.1.255 -i u100
      
      - Broadcast racing flow chart is as follow:
      
      br_handle_frame
        BR_HOOK(NFPROTO_BRIDGE, NF_BR_PRE_ROUTING, br_handle_frame_finish)
        // skb->_nfct (unconfirmed conntrack) is constructed at PRE_ROUTING stage
        br_handle_frame_finish
          // check if this packet is broadcast
          br_flood_forward
            br_flood
              list_for_each_entry_rcu(p, &br->port_list, list) // iterate through each port
                maybe_deliver
                  deliver_clone
                    skb = skb_clone(skb)
                    __br_forward
                      BR_HOOK(NFPROTO_BRIDGE, NF_BR_FORWARD,...)
                      // queue in our nfq and received by our userspace program
                      // goto __nf_conntrack_confirm with process context on CPU 1
          br_pass_frame_up
            BR_HOOK(NFPROTO_BRIDGE, NF_BR_LOCAL_IN,...)
            // goto __nf_conntrack_confirm with softirq context on CPU 0
      
      Because conntrack confirm can happen at both INPUT and POSTROUTING
      stage.  So with NFQUEUE running, skb->_nfct with the same unconfirmed
      conntrack could race on different core.
      
      This patch fixes a repeating kernel splat, now it is only displayed
      once.
      Signed-off-by: 's avatarChieh-Min Wang <chiehminw@synology.com>
      Signed-off-by: 's avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      f74b0a4b
    • Florian Westphal's avatar
      netfilter: conntrack: tcp: only close if RST matches exact sequence · b39898be
      Florian Westphal authored
      [ Upstream commit be0502a3 ]
      
      TCP resets cause instant transition from established to closed state
      provided the reset is in-window.  Endpoints that implement RFC 5961
      require resets to match the next expected sequence number.
      RST segments that are in-window (but that do not match RCV.NXT) are
      ignored, and a "challenge ACK" is sent back.
      
      Main problem for conntrack is that its a middlebox, i.e.  whereas an end
      host might have ACK'd SEQ (and would thus accept an RST with this
      sequence number), conntrack might not have seen this ACK (yet).
      
      Therefore we can't simply flag RSTs with non-exact match as invalid.
      
      This updates RST processing as follows:
      
      1. If the connection is in a state other than ESTABLISHED, nothing is
         changed, RST is subject to normal in-window check.
      
      2. If the RSTs sequence number either matches exactly RCV.NXT,
         connection state moves to CLOSE.
      
      3. The same applies if the RST sequence number aligns with a previous
         packet in the same direction.
      
      In all other cases, the connection remains in ESTABLISHED state.
      If the normal-in-window check passes, the timeout will be lowered
      to that of CLOSE.
      
      If the peer sends a challenge ack, connection timeout will be reset.
      
      If the challenge ACK triggers another RST (RST was valid after all),
      this 2nd RST will match expected sequence and conntrack state changes to
      CLOSE.
      
      If no challenge ACK is received, the connection will time out after
      CLOSE seconds (10 seconds by default), just like without this patch.
      
      Packetdrill test case:
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      0.000 bind(3, ..., ...) = 0
      0.000 listen(3, 1) = 0
      
      0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7>
      0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7>
      0.200 < . 1:1(0) ack 1 win 257
      0.200 accept(3, ..., ...) = 4
      
      // Receive a segment.
      0.210 < P. 1:1001(1000) ack 1 win 46
      0.210 > . 1:1(0) ack 1001
      
      // Application writes 1000 bytes.
      0.250 write(4, ..., 1000) = 1000
      0.250 > P. 1:1001(1000) ack 1001
      
      // First reset, old sequence. Conntrack (correctly) considers this
      // invalid due to failed window validation (regardless of this patch).
      0.260 < R  2:2(0) ack 1001 win 260
      
      // 2nd reset, but too far ahead sequence.  Same: correctly handled
      // as invalid.
      0.270 < R 99990001:99990001(0) ack 1001 win 260
      
      // in-window, but not exact sequence.
      // Current Linux kernels might reply with a challenge ack, and do not
      // remove connection.
      // Without this patch, conntrack state moves to CLOSE.
      // With patch, timeout is lowered like CLOSE, but connection stays
      // in ESTABLISHED state.
      0.280 < R 1010:1010(0) ack 1001 win 260
      
      // Expect challenge ACK
      0.281 > . 1001:1001(0) ack 1001 win 501
      
      // With or without this patch, RST will cause connection
      // to move to CLOSE (sequence number matches)
      // 0.282 < R 1001:1001(0) ack 1001 win 260
      
      // ACK
      0.300 < . 1001:1001(0) ack 1001 win 257
      
      // more data could be exchanged here, connection
      // is still established
      
      // Client closes the connection.
      0.610 < F. 1001:1001(0) ack 1001 win 260
      0.650 > . 1001:1001(0) ack 1002
      
      // Close the connection without reading outstanding data
      0.700 close(4) = 0
      
      // so one more reset.  Will be deemed acceptable with patch as well:
      // connection is already closing.
      0.701 > R. 1001:1001(0) ack 1002 win 501
      // End packetdrill test case.
      
      With patch, this generates following conntrack events:
         [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED]
      [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80
      [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED]
      
      Without patch, first RST moves connection to close, whereas socket state
      does not change until FIN is received.
         [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED]
      [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80
      [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
      [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED]
      
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: 's avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: 's avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      b39898be
    • Li RongQing's avatar
      netfilter: nf_tables: check the result of dereferencing base_chain->stats · fdb08cf7
      Li RongQing authored
      [ Upstream commit a9f5e78c ]
      
      Check the result of dereferencing base_chain->stats, instead of result
      of this_cpu_ptr with NULL.
      
      base_chain->stats maybe be changed to NULL when a chain is updated and a
      new NULL counter can be attached.
      
      And we do not need to check returning of this_cpu_ptr since
      base_chain->stats is from percpu allocator if it is non-NULL,
      this_cpu_ptr returns a valid value.
      
      And fix two sparse error by replacing rcu_access_pointer and
      rcu_dereference with READ_ONCE under rcu_read_lock.
      
      Thanks for Eric's help to finish this patch.
      
      Fixes: 00924094 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set")
      Signed-off-by: 's avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: 's avatarZhang Yu <zhangyu31@baidu.com>
      Signed-off-by: 's avatarLi RongQing <lirongqing@baidu.com>
      Signed-off-by: 's avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      fdb08cf7
    • Björn Töpel's avatar
      xsk: fix to reject invalid flags in xsk_bind · b48475a6
      Björn Töpel authored
      [ Upstream commit f54ba391 ]
      
      Passing a non-existing flag in the sxdp_flags member of struct
      sockaddr_xdp was, incorrectly, silently ignored. This patch addresses
      that behavior, and rejects any non-existing flags.
      
      We have examined existing user space code, and to our best knowledge,
      no one is relying on the current incorrect behavior. AF_XDP is still
      in its infancy, so from our perspective, the risk of breakage is very
      low, and addressing this problem now is important.
      
      Fixes: 965a9909 ("xsk: add support for bind for Rx")
      Signed-off-by: 's avatarBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: 's avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      b48475a6
  2. 03 Apr, 2019 17 commits
    • Herbert Xu's avatar
      ila: Fix rhashtable walker list corruption · d01bf376
      Herbert Xu authored
      [ Upstream commit b5f9bd15 ]
      
      ila_xlat_nl_cmd_flush uses rhashtable walkers allocated from the
      stack but it never frees them.  This corrupts the walker list of
      the hash table.
      
      This patch fixes it.
      
      Reported-by: syzbot+dae72a112334aa65a159@syzkaller.appspotmail.com
      Fixes: b6e71bde ("ila: Flush netlink command to clear xlat...")
      Signed-off-by: 's avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d01bf376
    • Erik Hugne's avatar
      tipc: fix cancellation of topology subscriptions · 9868ffd4
      Erik Hugne authored
      [ Upstream commit 33872d79 ]
      
      When cancelling a subscription, we have to clear the cancel bit in the
      request before iterating over any established subscriptions with memcmp.
      Otherwise no subscription will ever be found, and it will not be
      possible to explicitly unsubscribe individual subscriptions.
      
      Fixes: 8985ecc7 ("tipc: simplify endianness handling in topology subscriber")
      Signed-off-by: 's avatarErik Hugne <erik.hugne@gmail.com>
      Signed-off-by: 's avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9868ffd4
    • Xin Long's avatar
      tipc: change to check tipc_own_id to return in tipc_net_stop · e13fbdf6
      Xin Long authored
      [ Upstream commit 9926cb5f ]
      
      When running a syz script, a panic occurred:
      
      [  156.088228] BUG: KASAN: use-after-free in tipc_disc_timeout+0x9c9/0xb20 [tipc]
      [  156.094315] Call Trace:
      [  156.094844]  <IRQ>
      [  156.095306]  dump_stack+0x7c/0xc0
      [  156.097346]  print_address_description+0x65/0x22e
      [  156.100445]  kasan_report.cold.3+0x37/0x7a
      [  156.102402]  tipc_disc_timeout+0x9c9/0xb20 [tipc]
      [  156.106517]  call_timer_fn+0x19a/0x610
      [  156.112749]  run_timer_softirq+0xb51/0x1090
      
      It was caused by the netns freed without deleting the discoverer timer,
      while later on the netns would be accessed in the timer handler.
      
      The timer should have been deleted by tipc_net_stop() when cleaning up a
      netns. However, tipc has been able to enable a bearer and start d->timer
      without the local node_addr set since Commit 52dfae5c ("tipc: obtain
      node identity from interface by default"), which caused the timer not to
      be deleted in tipc_net_stop() then.
      
      So fix it in tipc_net_stop() by changing to check local node_id instead
      of local node_addr, as Jon suggested.
      
      While at it, remove the calling of tipc_nametbl_withdraw() there, since
      tipc_nametbl_stop() will take of the nametbl's freeing after.
      
      Fixes: 52dfae5c ("tipc: obtain node identity from interface by default")
      Reported-by: syzbot+a25307ad099309f1c2b9@syzkaller.appspotmail.com
      Signed-off-by: 's avatarXin Long <lucien.xin@gmail.com>
      Acked-by: Ying Xue's avatarYing Xue <ying.xue@windriver.com>
      Acked-by: 's avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e13fbdf6
    • Erik Hugne's avatar
      tipc: allow service ranges to be connect()'ed on RDM/DGRAM · 30e2a9a3
      Erik Hugne authored
      [ Upstream commit ea239314 ]
      
      We move the check that prevents connecting service ranges to after
      the RDM/DGRAM check, and move address sanity control to a separate
      function that also validates the service range.
      
      Fixes: 23998835 ("tipc: improve address sanity check in tipc_connect()")
      Signed-off-by: 's avatarErik Hugne <erik.hugne@gmail.com>
      Signed-off-by: 's avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30e2a9a3
    • Eric Dumazet's avatar
      tcp: do not use ipv6 header for ipv4 flow · 632f3ed8
      Eric Dumazet authored
      [ Upstream commit 89e41309 ]
      
      When a dual stack tcp listener accepts an ipv4 flow,
      it should not attempt to use an ipv6 header or tcp_v6_iif() helper.
      
      Fixes: 1397ed35 ("ipv6: add flowinfo for tcp6 pkt_options for all cases")
      Fixes: df3687ff ("ipv6: add the IPV6_FL_F_REFLECT flag to IPV6_FL_A_GET")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      632f3ed8
    • Xin Long's avatar
      sctp: use memdup_user instead of vmemdup_user · 118ad2c7
      Xin Long authored
      [ Upstream commit ef82bcfa ]
      
      In sctp_setsockopt_bindx()/__sctp_setsockopt_connectx(), it allocates
      memory with addrs_size which is passed from userspace. We used flag
      GFP_USER to put some more restrictions on it in Commit cacc0621
      ("sctp: use GFP_USER for user-controlled kmalloc").
      
      However, since Commit c981f254 ("sctp: use vmemdup_user() rather
      than badly open-coding memdup_user()"), vmemdup_user() has been used,
      which doesn't check GFP_USER flag when goes to vmalloc_*(). So when
      addrs_size is a huge value, it could exhaust memory and even trigger
      oom killer.
      
      This patch is to use memdup_user() instead, in which GFP_USER would
      work to limit the memory allocation with a huge addrs_size.
      
      Note we can't fix it by limiting 'addrs_size', as there's no demand
      for it from RFC.
      
      Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
      Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()")
      Signed-off-by: 's avatarXin Long <lucien.xin@gmail.com>
      Acked-by: 's avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      118ad2c7
    • Maxime Chevallier's avatar
      packets: Always register packet sk in the same order · 278c7d7e
      Maxime Chevallier authored
      [ Upstream commit a4dc6a49 ]
      
      When using fanouts with AF_PACKET, the demux functions such as
      fanout_demux_cpu will return an index in the fanout socket array, which
      corresponds to the selected socket.
      
      The ordering of this array depends on the order the sockets were added
      to a given fanout group, so for FANOUT_CPU this means sockets are bound
      to cpus in the order they are configured, which is OK.
      
      However, when stopping then restarting the interface these sockets are
      bound to, the sockets are reassigned to the fanout group in the reverse
      order, due to the fact that they were inserted at the head of the
      interface's AF_PACKET socket list.
      
      This means that traffic that was directed to the first socket in the
      fanout group is now directed to the last one after an interface restart.
      
      In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
      socket that used to receive traffic from the last CPU after an interface
      restart.
      
      This commit introduces a helper to add a socket at the tail of a list,
      then uses it to register AF_PACKET sockets.
      
      Note that this changes the order in which sockets are listed in /proc and
      with sock_diag.
      
      Fixes: dc99f600 ("packet: Add fanout support")
      Signed-off-by: 's avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Acked-by: 's avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      278c7d7e
    • YueHaibing's avatar
      net-sysfs: call dev_hold if kobject_init_and_add success · 566e793d
      YueHaibing authored
      [ Upstream commit a3e23f71 ]
      
      In netdev_queue_add_kobject and rx_queue_add_kobject,
      if sysfs_create_group failed, kobject_put will call
      netdev_queue_release to decrease dev refcont, however
      dev_hold has not be called. So we will see this while
      unregistering dev:
      
      unregister_netdevice: waiting for bcsh0 to become free. Usage count = -1
      Reported-by: 's avatarHulk Robot <hulkci@huawei.com>
      Fixes: d0d66837 ("net: don't decrement kobj reference count on init failure")
      Signed-off-by: 's avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      566e793d
    • Eric Dumazet's avatar
      net: rose: fix a possible stack overflow · 8cf288b5
      Eric Dumazet authored
      [ Upstream commit e5dcc0c3 ]
      
      rose_write_internal() uses a temp buffer of 100 bytes, but a manual
      inspection showed that given arbitrary input, rose_create_facilities()
      can fill up to 110 bytes.
      
      Lets use a tailroom of 256 bytes for peace of mind, and remove
      the bounce buffer : we can simply allocate a big enough skb
      and adjust its length as needed.
      
      syzbot report :
      
      BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:352 [inline]
      BUG: KASAN: stack-out-of-bounds in rose_create_facilities net/rose/rose_subr.c:521 [inline]
      BUG: KASAN: stack-out-of-bounds in rose_write_internal+0x597/0x15d0 net/rose/rose_subr.c:116
      Write of size 7 at addr ffff88808b1ffbef by task syz-executor.0/24854
      
      CPU: 0 PID: 24854 Comm: syz-executor.0 Not tainted 5.0.0+ #97
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
       kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       check_memory_region_inline mm/kasan/generic.c:185 [inline]
       check_memory_region+0x123/0x190 mm/kasan/generic.c:191
       memcpy+0x38/0x50 mm/kasan/common.c:131
       memcpy include/linux/string.h:352 [inline]
       rose_create_facilities net/rose/rose_subr.c:521 [inline]
       rose_write_internal+0x597/0x15d0 net/rose/rose_subr.c:116
       rose_connect+0x7cb/0x1510 net/rose/af_rose.c:826
       __sys_connect+0x266/0x330 net/socket.c:1685
       __do_sys_connect net/socket.c:1696 [inline]
       __se_sys_connect net/socket.c:1693 [inline]
       __x64_sys_connect+0x73/0xb0 net/socket.c:1693
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x458079
      Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f47b8d9dc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458079
      RDX: 000000000000001c RSI: 0000000020000040 RDI: 0000000000000004
      RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f47b8d9e6d4
      R13: 00000000004be4a4 R14: 00000000004ceca8 R15: 00000000ffffffff
      
      The buggy address belongs to the page:
      page:ffffea00022c7fc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
      flags: 0x1fffc0000000000()
      raw: 01fffc0000000000 0000000000000000 ffffffff022c0101 0000000000000000
      raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88808b1ffa80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff88808b1ffb00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 03
      >ffff88808b1ffb80: f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 04 f3
                                                                   ^
       ffff88808b1ffc00: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
       ffff88808b1ffc80: 00 00 00 00 00 00 00 f1 f1 f1 f1 f1 f1 01 f2 01
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Reported-by: 's avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8cf288b5
    • Christoph Paasch's avatar
      net/packet: Set __GFP_NOWARN upon allocation in alloc_pg_vec · 3ca86ad4
      Christoph Paasch authored
      [ Upstream commit 398f0132 ]
      
      Since commit fc62814d ("net/packet: fix 4gb buffer limit due to overflow check")
      one can now allocate packet ring buffers >= UINT_MAX. However, syzkaller
      found that that triggers a warning:
      
      [   21.100000] WARNING: CPU: 2 PID: 2075 at mm/page_alloc.c:4584 __alloc_pages_nod0
      [   21.101490] Modules linked in:
      [   21.101921] CPU: 2 PID: 2075 Comm: syz-executor.0 Not tainted 5.0.0 #146
      [   21.102784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
      [   21.103887] RIP: 0010:__alloc_pages_nodemask+0x2a0/0x630
      [   21.104640] Code: fe ff ff 65 48 8b 04 25 c0 de 01 00 48 05 90 0f 00 00 41 bd 01 00 00 00 48 89 44 24 48 e9 9c fe 3
      [   21.107121] RSP: 0018:ffff88805e1cf920 EFLAGS: 00010246
      [   21.107819] RAX: 0000000000000000 RBX: ffffffff85a488a0 RCX: 0000000000000000
      [   21.108753] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000000
      [   21.109699] RBP: 1ffff1100bc39f28 R08: ffffed100bcefb67 R09: ffffed100bcefb67
      [   21.110646] R10: 0000000000000001 R11: ffffed100bcefb66 R12: 000000000000000d
      [   21.111623] R13: 0000000000000000 R14: ffff88805e77d888 R15: 000000000000000d
      [   21.112552] FS:  00007f7c7de05700(0000) GS:ffff88806d100000(0000) knlGS:0000000000000000
      [   21.113612] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   21.114405] CR2: 000000000065c000 CR3: 000000005e58e006 CR4: 00000000001606e0
      [   21.115367] Call Trace:
      [   21.115705]  ? __alloc_pages_slowpath+0x21c0/0x21c0
      [   21.116362]  alloc_pages_current+0xac/0x1e0
      [   21.116923]  kmalloc_order+0x18/0x70
      [   21.117393]  kmalloc_order_trace+0x18/0x110
      [   21.117949]  packet_set_ring+0x9d5/0x1770
      [   21.118524]  ? packet_rcv_spkt+0x440/0x440
      [   21.119094]  ? lock_downgrade+0x620/0x620
      [   21.119646]  ? __might_fault+0x177/0x1b0
      [   21.120177]  packet_setsockopt+0x981/0x2940
      [   21.120753]  ? __fget+0x2fb/0x4b0
      [   21.121209]  ? packet_release+0xab0/0xab0
      [   21.121740]  ? sock_has_perm+0x1cd/0x260
      [   21.122297]  ? selinux_secmark_relabel_packet+0xd0/0xd0
      [   21.123013]  ? __fget+0x324/0x4b0
      [   21.123451]  ? selinux_netlbl_socket_setsockopt+0x101/0x320
      [   21.124186]  ? selinux_netlbl_sock_rcv_skb+0x3a0/0x3a0
      [   21.124908]  ? __lock_acquire+0x529/0x3200
      [   21.125453]  ? selinux_socket_setsockopt+0x5d/0x70
      [   21.126075]  ? __sys_setsockopt+0x131/0x210
      [   21.126533]  ? packet_release+0xab0/0xab0
      [   21.127004]  __sys_setsockopt+0x131/0x210
      [   21.127449]  ? kernel_accept+0x2f0/0x2f0
      [   21.127911]  ? ret_from_fork+0x8/0x50
      [   21.128313]  ? do_raw_spin_lock+0x11b/0x280
      [   21.128800]  __x64_sys_setsockopt+0xba/0x150
      [   21.129271]  ? lockdep_hardirqs_on+0x37f/0x560
      [   21.129769]  do_syscall_64+0x9f/0x450
      [   21.130182]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      We should allocate with __GFP_NOWARN to handle this.
      
      Cc: Kal Conley <kal.conley@dectris.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Fixes: fc62814d ("net/packet: fix 4gb buffer limit due to overflow check")
      Signed-off-by: 's avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ca86ad4
    • Paolo Abeni's avatar
      net: datagram: fix unbounded loop in __skb_try_recv_datagram() · 475af634
      Paolo Abeni authored
      [ Upstream commit 0b91bce1 ]
      
      Christoph reported a stall while peeking datagram with an offset when
      busy polling is enabled. __skb_try_recv_datagram() uses as the loop
      termination condition 'queue empty'. When peeking, the socket
      queue can be not empty, even when no additional packets are received.
      
      Address the issue explicitly checking for receive queue changes,
      as currently done by __skb_wait_for_more_packets().
      
      Fixes: 2b5cd0df ("net: Change return type of sk_busy_loop from bool to void")
      Reported-and-tested-by: 's avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: 's avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      475af634
    • Xin Long's avatar
      ipv6: make ip6_create_rt_rcu return ip6_null_entry instead of NULL · 282c70c2
      Xin Long authored
      [ Upstream commit 1c87e79a ]
      
      Jianlin reported a crash:
      
        [  381.484332] BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
        [  381.619802] RIP: 0010:fib6_rule_lookup+0xa3/0x160
        [  382.009615] Call Trace:
        [  382.020762]  <IRQ>
        [  382.030174]  ip6_route_redirect.isra.52+0xc9/0xf0
        [  382.050984]  ip6_redirect+0xb6/0xf0
        [  382.066731]  icmpv6_notify+0xca/0x190
        [  382.083185]  ndisc_redirect_rcv+0x10f/0x160
        [  382.102569]  ndisc_rcv+0xfb/0x100
        [  382.117725]  icmpv6_rcv+0x3f2/0x520
        [  382.133637]  ip6_input_finish+0xbf/0x460
        [  382.151634]  ip6_input+0x3b/0xb0
        [  382.166097]  ipv6_rcv+0x378/0x4e0
      
      It was caused by the lookup function __ip6_route_redirect() returns NULL in
      fib6_rule_lookup() when ip6_create_rt_rcu() returns NULL.
      
      So we fix it by simply making ip6_create_rt_rcu() return ip6_null_entry
      instead of NULL.
      
      v1->v2:
        - move down 'fallback:' to make it more readable.
      
      Fixes: e873e4b9 ("ipv6: use fib6_info_hold_safe() when necessary")
      Reported-by: 's avatarJianlin Shi <jishi@redhat.com>
      Suggested-by: 's avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: 's avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: 's avatarDavid Ahern <dsahern@gmail.com>
      Acked-by: 's avatarWei Wang <weiwan@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      282c70c2
    • YueHaibing's avatar
      genetlink: Fix a memory leak on error path · bd60a788
      YueHaibing authored
      [ Upstream commit ceabee6c ]
      
      In genl_register_family(), when idr_alloc() fails,
      we forget to free the memory we possibly allocate for
      family->attrbuf.
      Reported-by: 's avatarHulk Robot <hulkci@huawei.com>
      Fixes: 2ae0f17d ("genetlink: use idr to track families")
      Signed-off-by: 's avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: 's avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd60a788
    • Eric Dumazet's avatar
      dccp: do not use ipv6 header for ipv4 flow · 3b58f24b
      Eric Dumazet authored
      [ Upstream commit e0aa6770 ]
      
      When a dual stack dccp listener accepts an ipv4 flow,
      it should not attempt to use an ipv6 header or
      inet6_iif() helper.
      
      Fixes: 3df80d93 ("[DCCP]: Introduce DCCPv6")
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b58f24b
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: fix set double-free in abort path · 400dded5
      Pablo Neira Ayuso authored
      [ Upstream commit 40ba1d9b ]
      
      The abort path can cause a double-free of an anonymous set.
      Added-and-to-be-aborted rule looks like this:
      
      udp dport { 137, 138 } drop
      
      The to-be-aborted transaction list looks like this:
      
      newset
      newsetelem
      newsetelem
      rule
      
      This gets walked in reverse order, so first pass disables the rule, the
      set elements, then the set.
      
      After synchronize_rcu(), we then destroy those in same order: rule, set
      element, set element, newset.
      
      Problem is that the anonymous set has already been bound to the rule, so
      the rule (lookup expression destructor) already frees the set, when then
      cause use-after-free when trying to delete the elements from this set,
      then try to free the set again when handling the newset expression.
      
      Rule releases the bound set in first place from the abort path, this
      causes the use-after-free on set element removal when undoing the new
      element transactions. To handle this, skip new element transaction if
      set is bound from the abort path.
      
      This is still causes the use-after-free on set element removal.  To
      handle this, remove transaction from the list when the set is already
      bound.
      
      Joint work with Florian Westphal.
      
      Fixes: f6ac8585 ("netfilter: nf_tables: unbind set in rule from commit path")
      Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1325Acked-by: 's avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: 's avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: 's avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      400dded5
    • Marcel Holtmann's avatar
      Bluetooth: Verify that l2cap_get_conf_opt provides large enough buffer · a556547b
      Marcel Holtmann authored
      commit 7c9cbd0b upstream.
      
      The function l2cap_get_conf_opt will return L2CAP_CONF_OPT_SIZE + opt->len
      as length value. The opt->len however is in control over the remote user
      and can be used by an attacker to gain access beyond the bounds of the
      actual packet.
      
      To prevent any potential leak of heap memory, it is enough to check that
      the resulting len calculation after calling l2cap_get_conf_opt is not
      below zero. A well formed packet will always return >= 0 here and will
      end with the length value being zero after the last option has been
      parsed. In case of malformed packets messing with the opt->len field the
      length value will become negative. If that is the case, then just abort
      and ignore the option.
      
      In case an attacker uses a too short opt->len value, then garbage will
      be parsed, but that is protected by the unknown option handling and also
      the option parameter size checks.
      Signed-off-by: Marcel Holtmann's avatarMarcel Holtmann <marcel@holtmann.org>
      Reviewed-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: 's avatarJohan Hedberg <johan.hedberg@intel.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a556547b
    • Marcel Holtmann's avatar
      Bluetooth: Check L2CAP option sizes returned from l2cap_get_conf_opt · 8dac9b8d
      Marcel Holtmann authored
      commit af3d5d1c upstream.
      
      When doing option parsing for standard type values of 1, 2 or 4 octets,
      the value is converted directly into a variable instead of a pointer. To
      avoid being tricked into being a pointer, check that for these option
      types that sizes actually match. In L2CAP every option is fixed size and
      thus it is prudent anyway to ensure that the remote side sends us the
      right option size along with option paramters.
      
      If the option size is not matching the option type, then that option is
      silently ignored. It is a protocol violation and instead of trying to
      give the remote attacker any further hints just pretend that option is
      not present and proceed with the default values. Implementation
      following the specification and its qualification procedures will always
      use the correct size and thus not being impacted here.
      
      To keep the code readable and consistent accross all options, a few
      cosmetic changes were also required.
      Signed-off-by: Marcel Holtmann's avatarMarcel Holtmann <marcel@holtmann.org>
      Reviewed-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: 's avatarJohan Hedberg <johan.hedberg@intel.com>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8dac9b8d
  3. 27 Mar, 2019 3 commits
  4. 23 Mar, 2019 5 commits
  5. 19 Mar, 2019 10 commits
    • Vlad Buslov's avatar
      net: sched: flower: insert new filter to idr after setting its mask · 15c5945f
      Vlad Buslov authored
      [ Upstream commit ecb3dea4 ]
      
      When adding new filter to flower classifier, fl_change() inserts it to
      handle_idr before initializing filter extensions and assigning it a mask.
      Normally this ordering doesn't matter because all flower classifier ops
      callbacks assume rtnl lock protection. However, when filter has an action
      that doesn't have its kernel module loaded, rtnl lock is released before
      call to request_module(). During this time the filter can be accessed bu
      concurrent task before its initialization is completed, which can lead to a
      crash.
      
      Example case of NULL pointer dereference in concurrent dump:
      
      Task 1                           Task 2
      
      tc_new_tfilter()
       fl_change()
        idr_alloc_u32(fnew)
        fl_set_parms()
         tcf_exts_validate()
          tcf_action_init()
           tcf_action_init_1()
            rtnl_unlock()
            request_module()
            ...                        rtnl_lock()
            				 tc_dump_tfilter()
            				  tcf_chain_dump()
      				   fl_walk()
      				    idr_get_next_ul()
      				    tcf_node_dump()
      				     tcf_fill_node()
      				      fl_dump()
      				       mask = &f->mask->key; <- NULL ptr
            rtnl_lock()
      
      Extension initialization and mask assignment don't depend on fnew->handle
      that is allocated by idr_alloc_u32(). Move idr allocation code after action
      creation and mask assignment in fl_change() to prevent concurrent access
      to not fully initialized filter when rtnl lock is released to load action
      module.
      
      Fixes: 01683a14 ("net: sched: refactor flower walk to iterate over idr")
      Signed-off-by: 's avatarVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: 's avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15c5945f
    • Adalbert Lazăr's avatar
      vsock/virtio: fix kernel panic from virtio_transport_reset_no_sock · 882e7866
      Adalbert Lazăr authored
      [ Upstream commit 4c404ce2 ]
      
      Previous to commit 22b5c0b6 ("vsock/virtio: fix kernel panic
      after device hot-unplug"), vsock_core_init() was called from
      virtio_vsock_probe(). Now, virtio_transport_reset_no_sock() can be called
      before vsock_core_init() has the chance to run.
      
      [Wed Feb 27 14:17:09 2019] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
      [Wed Feb 27 14:17:09 2019] #PF error: [normal kernel read fault]
      [Wed Feb 27 14:17:09 2019] PGD 0 P4D 0
      [Wed Feb 27 14:17:09 2019] Oops: 0000 [#1] SMP PTI
      [Wed Feb 27 14:17:09 2019] CPU: 3 PID: 59 Comm: kworker/3:1 Not tainted 5.0.0-rc7-390-generic-hvi #390
      [Wed Feb 27 14:17:09 2019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [Wed Feb 27 14:17:09 2019] Workqueue: virtio_vsock virtio_transport_rx_work [vmw_vsock_virtio_transport]
      [Wed Feb 27 14:17:09 2019] RIP: 0010:virtio_transport_reset_no_sock+0x8c/0xc0 [vmw_vsock_virtio_transport_common]
      [Wed Feb 27 14:17:09 2019] Code: 35 8b 4f 14 48 8b 57 08 31 f6 44 8b 4f 10 44 8b 07 48 8d 7d c8 e8 84 f8 ff ff 48 85 c0 48 89 c3 74 2a e8 f7 31 03 00 48 89 df <48> 8b 80 10 01 00 00 e8 68 fb 69 ed 48 8b 75 f0 65 48 33 34 25 28
      [Wed Feb 27 14:17:09 2019] RSP: 0018:ffffb42701ab7d40 EFLAGS: 00010282
      [Wed Feb 27 14:17:09 2019] RAX: 0000000000000000 RBX: ffff9d79637ee080 RCX: 0000000000000003
      [Wed Feb 27 14:17:09 2019] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9d79637ee080
      [Wed Feb 27 14:17:09 2019] RBP: ffffb42701ab7d78 R08: ffff9d796fae70e0 R09: ffff9d796f403500
      [Wed Feb 27 14:17:09 2019] R10: ffffb42701ab7d90 R11: 0000000000000000 R12: ffff9d7969d09240
      [Wed Feb 27 14:17:09 2019] R13: ffff9d79624e6840 R14: ffff9d7969d09318 R15: ffff9d796d48ff80
      [Wed Feb 27 14:17:09 2019] FS:  0000000000000000(0000) GS:ffff9d796fac0000(0000) knlGS:0000000000000000
      [Wed Feb 27 14:17:09 2019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [Wed Feb 27 14:17:09 2019] CR2: 0000000000000110 CR3: 0000000427f22000 CR4: 00000000000006e0
      [Wed Feb 27 14:17:09 2019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [Wed Feb 27 14:17:09 2019] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [Wed Feb 27 14:17:09 2019] Call Trace:
      [Wed Feb 27 14:17:09 2019]  virtio_transport_recv_pkt+0x63/0x820 [vmw_vsock_virtio_transport_common]
      [Wed Feb 27 14:17:09 2019]  ? kfree+0x17e/0x190
      [Wed Feb 27 14:17:09 2019]  ? detach_buf_split+0x145/0x160
      [Wed Feb 27 14:17:09 2019]  ? __switch_to_asm+0x40/0x70
      [Wed Feb 27 14:17:09 2019]  virtio_transport_rx_work+0xa0/0x106 [vmw_vsock_virtio_transport]
      [Wed Feb 27 14:17:09 2019] NET: Registered protocol family 40
      [Wed Feb 27 14:17:09 2019]  process_one_work+0x167/0x410
      [Wed Feb 27 14:17:09 2019]  worker_thread+0x4d/0x460
      [Wed Feb 27 14:17:09 2019]  kthread+0x105/0x140
      [Wed Feb 27 14:17:09 2019]  ? rescuer_thread+0x360/0x360
      [Wed Feb 27 14:17:09 2019]  ? kthread_destroy_worker+0x50/0x50
      [Wed Feb 27 14:17:09 2019]  ret_from_fork+0x35/0x40
      [Wed Feb 27 14:17:09 2019] Modules linked in: vmw_vsock_virtio_transport vmw_vsock_virtio_transport_common input_leds vsock serio_raw i2c_piix4 mac_hid qemu_fw_cfg autofs4 cirrus ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops virtio_net psmouse drm net_failover pata_acpi virtio_blk failover floppy
      
      Fixes: 22b5c0b6 ("vsock/virtio: fix kernel panic after device hot-unplug")
      Reported-by: 's avatarAlexandru Herghelegiu <aherghelegiu@bitdefender.com>
      Signed-off-by: 's avatarAdalbert Lazăr <alazar@bitdefender.com>
      Co-developed-by: 's avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: 's avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: 's avatarStefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      882e7866
    • Guillaume Nault's avatar
      tcp: handle inet_csk_reqsk_queue_add() failures · 3460bb19
      Guillaume Nault authored
      [  Upstream commit 9d3e1368 ]
      
      Commit 7716682c ("tcp/dccp: fix another race at listener
      dismantle") let inet_csk_reqsk_queue_add() fail, and adjusted
      {tcp,dccp}_check_req() accordingly. However, TFO and syncookies
      weren't modified, thus leaking allocated resources on error.
      
      Contrary to tcp_check_req(), in both syncookies and TFO cases,
      we need to drop the request socket. Also, since the child socket is
      created with inet_csk_clone_lock(), we have to unlock it and drop an
      extra reference (->sk_refcount is initially set to 2 and
      inet_csk_reqsk_queue_add() drops only one ref).
      
      For TFO, we also need to revert the work done by tcp_try_fastopen()
      (with reqsk_fastopen_remove()).
      
      Fixes: 7716682c ("tcp/dccp: fix another race at listener dismantle")
      Signed-off-by: 's avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3460bb19
    • Christoph Paasch's avatar
      tcp: Don't access TCP_SKB_CB before initializing it · dfcf44d2
      Christoph Paasch authored
      [ Upstream commit f2feaefd ]
      
      Since commit eeea10b8 ("tcp: add
      tcp_v4_fill_cb()/tcp_v4_restore_cb()"), tcp_vX_fill_cb is only called
      after tcp_filter(). That means, TCP_SKB_CB(skb)->end_seq still points to
      the IP-part of the cb.
      
      We thus should not mock with it, as this can trigger bugs (thanks
      syzkaller):
      [   12.349396] ==================================================================
      [   12.350188] BUG: KASAN: slab-out-of-bounds in ip6_datagram_recv_specific_ctl+0x19b3/0x1a20
      [   12.351035] Read of size 1 at addr ffff88006adbc208 by task test_ip6_datagr/1799
      
      Setting end_seq is actually no more necessary in tcp_filter as it gets
      initialized later on in tcp_vX_fill_cb.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Fixes: eeea10b8 ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()")
      Signed-off-by: 's avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dfcf44d2
    • Soheil Hassas Yeganeh's avatar
      tcp: do not report TCP_CM_INQ of 0 for closed connections · 75c9b039
      Soheil Hassas Yeganeh authored
      [ Upstream commit 6466e715 ]
      
      Returning 0 as inq to userspace indicates there is no more data to
      read, and the application needs to wait for EPOLLIN. For a connection
      that has received FIN from the remote peer, however, the application
      must continue reading until getting EOF (return value of 0
      from tcp_recvmsg) or an error, if edge-triggered epoll (EPOLLET) is
      being used. Otherwise, the application will never receive a new
      EPOLLIN, since there is no epoll edge after the FIN.
      
      Return 1 when there is no data left on the queue but the
      connection has received FIN, so that the applications continue
      reading.
      
      Fixes: b75eba76 (tcp: send in-queue bytes in cmsg upon read)
      Signed-off-by: 's avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: 's avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Acked-by: 's avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75c9b039
    • Xin Long's avatar
      sctp: remove sched init from sctp_stream_init · 05ad31a8
      Xin Long authored
      [ Upstream commit 2e990dfd ]
      
      syzbot reported a NULL-ptr deref caused by that sched->init() in
      sctp_stream_init() set stream->rr_next = NULL.
      
        kasan: GPF could be caused by NULL-ptr deref or user memory access
        RIP: 0010:sctp_sched_rr_dequeue+0xd3/0x170 net/sctp/stream_sched_rr.c:141
        Call Trace:
          sctp_outq_dequeue_data net/sctp/outqueue.c:90 [inline]
          sctp_outq_flush_data net/sctp/outqueue.c:1079 [inline]
          sctp_outq_flush+0xba2/0x2790 net/sctp/outqueue.c:1205
      
      All sched info is saved in sout->ext now, in sctp_stream_init()
      sctp_stream_alloc_out() will not change it, there's no need to
      call sched->init() again, since sctp_outq_init() has already
      done it.
      
      Fixes: 5bbbbe32 ("sctp: introduce stream scheduler foundations")
      Reported-by: syzbot+4c9934f20522c0efd657@syzkaller.appspotmail.com
      Signed-off-by: 's avatarXin Long <lucien.xin@gmail.com>
      Acked-by: 's avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: 's avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05ad31a8
    • David Howells's avatar
      rxrpc: Fix client call queueing, waiting for channel · 3aca8931
      David Howells authored
      [ Upstream commit 69ffaebb ]
      
      rxrpc_get_client_conn() adds a new call to the front of the waiting_calls
      queue if the connection it's going to use already exists.  This is bad as
      it allows calls to get starved out.
      
      Fix this by adding to the tail instead.
      
      Also change the other enqueue point in the same function to put it on the
      front (ie. when we have a new connection).  This makes the point that in
      the case of a new connection the new call goes at the front (though it
      doesn't actually matter since the queue should be unoccupied).
      
      Fixes: 45025bce ("rxrpc: Improve management and caching of client connection objects")
      Signed-off-by: 's avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: 's avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3aca8931
    • Xin Long's avatar
      route: set the deleted fnhe fnhe_daddr to 0 in ip_del_fnhe to fix a race · ed98b01c
      Xin Long authored
      [ Upstream commit ee60ad21 ]
      
      The race occurs in __mkroute_output() when 2 threads lookup a dst:
      
        CPU A                 CPU B
        find_exception()
                              find_exception() [fnhe expires]
                              ip_del_fnhe() [fnhe is deleted]
        rt_bind_exception()
      
      In rt_bind_exception() it will bind a deleted fnhe with the new dst, and
      this dst will get no chance to be freed. It causes a dev defcnt leak and
      consecutive dmesg warnings:
      
        unregister_netdevice: waiting for ethX to become free. Usage count = 1
      
      Especially thanks Jon to identify the issue.
      
      This patch fixes it by setting fnhe_daddr to 0 in ip_del_fnhe() to stop
      binding the deleted fnhe with a new dst when checking fnhe's fnhe_daddr
      and daddr in rt_bind_exception().
      
      It works as both ip_del_fnhe() and rt_bind_exception() are protected by
      fnhe_lock and the fhne is freed by kfree_rcu().
      
      Fixes: deed49df ("route: check and remove route cache when we get route")
      Signed-off-by: 's avatarJon Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: 's avatarXin Long <lucien.xin@gmail.com>
      Reviewed-by: 's avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed98b01c
    • Eric Dumazet's avatar
      net/x25: reset state in x25_connect() · a6e37802
      Eric Dumazet authored
      [ Upstream commit ee74d0bd ]
      
      In case x25_connect() fails and frees the socket neighbour,
      we also need to undo the change done to x25->state.
      
      Before my last bug fix, we had use-after-free so this
      patch fixes a latent bug.
      
      syzbot report :
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 16137 Comm: syz-executor.1 Not tainted 5.0.0+ #117
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:x25_write_internal+0x1e8/0xdf0 net/x25/x25_subr.c:173
      Code: 00 40 88 b5 e0 fe ff ff 0f 85 01 0b 00 00 48 8b 8b 80 04 00 00 48 ba 00 00 00 00 00 fc ff df 48 8d 79 1c 48 89 fe 48 c1 ee 03 <0f> b6 34 16 48 89 fa 83 e2 07 83 c2 03 40 38 f2 7c 09 40 84 f6 0f
      RSP: 0018:ffff888076717a08 EFLAGS: 00010207
      RAX: ffff88805f2f2292 RBX: ffff8880a0ae6000 RCX: 0000000000000000
      kobject: 'loop5' (0000000018d0d0ee): kobject_uevent_env
      RDX: dffffc0000000000 RSI: 0000000000000003 RDI: 000000000000001c
      RBP: ffff888076717b40 R08: ffff8880950e0580 R09: ffffed100be5e46d
      R10: ffffed100be5e46c R11: ffff88805f2f2363 R12: ffff888065579840
      kobject: 'loop5' (0000000018d0d0ee): fill_kobj_path: path = '/devices/virtual/block/loop5'
      R13: 1ffff1100ece2f47 R14: 0000000000000013 R15: 0000000000000013
      FS:  00007fb88cf43700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f9a42a41028 CR3: 0000000087a67000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       x25_release+0xd0/0x340 net/x25/af_x25.c:658
       __sock_release+0xd3/0x2b0 net/socket.c:579
       sock_close+0x1b/0x30 net/socket.c:1162
       __fput+0x2df/0x8d0 fs/file_table.c:278
       ____fput+0x16/0x20 fs/file_table.c:309
       task_work_run+0x14a/0x1c0 kernel/task_work.c:113
       get_signal+0x1961/0x1d50 kernel/signal.c:2388
       do_signal+0x87/0x1940 arch/x86/kernel/signal.c:816
       exit_to_usermode_loop+0x244/0x2c0 arch/x86/entry/common.c:162
       prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
       do_syscall_64+0x52d/0x610 arch/x86/entry/common.c:293
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457f29
      Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fb88cf42c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
      RAX: fffffffffffffe00 RBX: 0000000000000003 RCX: 0000000000457f29
      RDX: 0000000000000012 RSI: 0000000020000080 RDI: 0000000000000004
      RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb88cf436d4
      R13: 00000000004be462 R14: 00000000004cec98 R15: 00000000ffffffff
      Modules linked in:
      
      Fixes: 95d6ebd5 ("net/x25: fix use-after-free in x25_device_event()")
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Cc: andrew hendry <andrew.hendry@gmail.com>
      Reported-by: 's avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a6e37802
    • Eric Dumazet's avatar
      net/x25: fix use-after-free in x25_device_event() · 391c4c52
      Eric Dumazet authored
      [ Upstream commit 95d6ebd5 ]
      
      In case of failure x25_connect() does a x25_neigh_put(x25->neighbour)
      but forgets to clear x25->neighbour pointer, thus triggering use-after-free.
      
      Since the socket is visible in x25_list, we need to hold x25_list_lock
      to protect the operation.
      
      syzbot report :
      
      BUG: KASAN: use-after-free in x25_kill_by_device net/x25/af_x25.c:217 [inline]
      BUG: KASAN: use-after-free in x25_device_event+0x296/0x2b0 net/x25/af_x25.c:252
      Read of size 8 at addr ffff8880a030edd0 by task syz-executor003/7854
      
      CPU: 0 PID: 7854 Comm: syz-executor003 Not tainted 5.0.0+ #97
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
       kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
       x25_kill_by_device net/x25/af_x25.c:217 [inline]
       x25_device_event+0x296/0x2b0 net/x25/af_x25.c:252
       notifier_call_chain+0xc7/0x240 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1739
       call_netdevice_notifiers_extack net/core/dev.c:1751 [inline]
       call_netdevice_notifiers net/core/dev.c:1765 [inline]
       __dev_notify_flags+0x1e9/0x2c0 net/core/dev.c:7607
       dev_change_flags+0x10d/0x170 net/core/dev.c:7643
       dev_ifsioc+0x2b0/0x940 net/core/dev_ioctl.c:237
       dev_ioctl+0x1b8/0xc70 net/core/dev_ioctl.c:488
       sock_do_ioctl+0x1bd/0x300 net/socket.c:995
       sock_ioctl+0x32b/0x610 net/socket.c:1096
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:509 [inline]
       do_vfs_ioctl+0xd6e/0x1390 fs/ioctl.c:696
       ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
       __do_sys_ioctl fs/ioctl.c:720 [inline]
       __se_sys_ioctl fs/ioctl.c:718 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4467c9
      Code: e8 0c e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 5b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fdbea222d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00000000006dbc58 RCX: 00000000004467c9
      RDX: 0000000020000340 RSI: 0000000000008914 RDI: 0000000000000003
      RBP: 00000000006dbc50 R08: 00007fdbea223700 R09: 0000000000000000
      R10: 00007fdbea223700 R11: 0000000000000246 R12: 00000000006dbc5c
      R13: 6000030030626669 R14: 0000000000000000 R15: 0000000030626669
      
      Allocated by task 7843:
       save_stack+0x45/0xd0 mm/kasan/common.c:73
       set_track mm/kasan/common.c:85 [inline]
       __kasan_kmalloc mm/kasan/common.c:495 [inline]
       __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:468
       kasan_kmalloc+0x9/0x10 mm/kasan/common.c:509
       kmem_cache_alloc_trace+0x151/0x760 mm/slab.c:3615
       kmalloc include/linux/slab.h:545 [inline]
       x25_link_device_up+0x46/0x3f0 net/x25/x25_link.c:249
       x25_device_event+0x116/0x2b0 net/x25/af_x25.c:242
       notifier_call_chain+0xc7/0x240 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1739
       call_netdevice_notifiers_extack net/core/dev.c:1751 [inline]
       call_netdevice_notifiers net/core/dev.c:1765 [inline]
       __dev_notify_flags+0x121/0x2c0 net/core/dev.c:7605
       dev_change_flags+0x10d/0x170 net/core/dev.c:7643
       dev_ifsioc+0x2b0/0x940 net/core/dev_ioctl.c:237
       dev_ioctl+0x1b8/0xc70 net/core/dev_ioctl.c:488
       sock_do_ioctl+0x1bd/0x300 net/socket.c:995
       sock_ioctl+0x32b/0x610 net/socket.c:1096
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:509 [inline]
       do_vfs_ioctl+0xd6e/0x1390 fs/ioctl.c:696
       ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
       __do_sys_ioctl fs/ioctl.c:720 [inline]
       __se_sys_ioctl fs/ioctl.c:718 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 7865:
       save_stack+0x45/0xd0 mm/kasan/common.c:73
       set_track mm/kasan/common.c:85 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/common.c:457
       kasan_slab_free+0xe/0x10 mm/kasan/common.c:465
       __cache_free mm/slab.c:3494 [inline]
       kfree+0xcf/0x230 mm/slab.c:3811
       x25_neigh_put include/net/x25.h:253 [inline]
       x25_connect+0x8d8/0xde0 net/x25/af_x25.c:824
       __sys_connect+0x266/0x330 net/socket.c:1685
       __do_sys_connect net/socket.c:1696 [inline]
       __se_sys_connect net/socket.c:1693 [inline]
       __x64_sys_connect+0x73/0xb0 net/socket.c:1693
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8880a030edc0
       which belongs to the cache kmalloc-256 of size 256
      The buggy address is located 16 bytes inside of
       256-byte region [ffff8880a030edc0, ffff8880a030eec0)
      The buggy address belongs to the page:
      page:ffffea000280c380 count:1 mapcount:0 mapping:ffff88812c3f07c0 index:0x0
      flags: 0x1fffc0000000200(slab)
      raw: 01fffc0000000200 ffffea0002806788 ffffea00027f0188 ffff88812c3f07c0
      raw: 0000000000000000 ffff8880a030e000 000000010000000c 0000000000000000
      page dumped because: kasan: bad access detected
      Signed-off-by: 's avatarEric Dumazet <edumazet@google.com>
      Reported-by: syzbot+04babcefcd396fabec37@syzkaller.appspotmail.com
      Cc: andrew hendry <andrew.hendry@gmail.com>
      Signed-off-by: 's avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: 's avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      391c4c52