1. 03 Apr, 2019 2 commits
    • Maxime Chevallier's avatar
      packets: Always register packet sk in the same order · 278c7d7e
      Maxime Chevallier authored
      [ Upstream commit a4dc6a49 ]
      
      When using fanouts with AF_PACKET, the demux functions such as
      fanout_demux_cpu will return an index in the fanout socket array, which
      corresponds to the selected socket.
      
      The ordering of this array depends on the order the sockets were added
      to a given fanout group, so for FANOUT_CPU this means sockets are bound
      to cpus in the order they are configured, which is OK.
      
      However, when stopping then restarting the interface these sockets are
      bound to, the sockets are reassigned to the fanout group in the reverse
      order, due to the fact that they were inserted at the head of the
      interface's AF_PACKET socket list.
      
      This means that traffic that was directed to the first socket in the
      fanout group is now directed to the last one after an interface restart.
      
      In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
      socket that used to receive traffic from the last CPU after an interface
      restart.
      
      This commit introduces a helper to add a socket at the tail of a list,
      then uses it to register AF_PACKET sockets.
      
      Note that this changes the order in which sockets are listed in /proc and
      with sock_diag.
      
      Fixes: dc99f600 ("packet: Add fanout support")
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      278c7d7e
    • Christoph Paasch's avatar
      net/packet: Set __GFP_NOWARN upon allocation in alloc_pg_vec · 3ca86ad4
      Christoph Paasch authored
      [ Upstream commit 398f0132 ]
      
      Since commit fc62814d ("net/packet: fix 4gb buffer limit due to overflow check")
      one can now allocate packet ring buffers >= UINT_MAX. However, syzkaller
      found that that triggers a warning:
      
      [   21.100000] WARNING: CPU: 2 PID: 2075 at mm/page_alloc.c:4584 __alloc_pages_nod0
      [   21.101490] Modules linked in:
      [   21.101921] CPU: 2 PID: 2075 Comm: syz-executor.0 Not tainted 5.0.0 #146
      [   21.102784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
      [   21.103887] RIP: 0010:__alloc_pages_nodemask+0x2a0/0x630
      [   21.104640] Code: fe ff ff 65 48 8b 04 25 c0 de 01 00 48 05 90 0f 00 00 41 bd 01 00 00 00 48 89 44 24 48 e9 9c fe 3
      [   21.107121] RSP: 0018:ffff88805e1cf920 EFLAGS: 00010246
      [   21.107819] RAX: 0000000000000000 RBX: ffffffff85a488a0 RCX: 0000000000000000
      [   21.108753] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000000
      [   21.109699] RBP: 1ffff1100bc39f28 R08: ffffed100bcefb67 R09: ffffed100bcefb67
      [   21.110646] R10: 0000000000000001 R11: ffffed100bcefb66 R12: 000000000000000d
      [   21.111623] R13: 0000000000000000 R14: ffff88805e77d888 R15: 000000000000000d
      [   21.112552] FS:  00007f7c7de05700(0000) GS:ffff88806d100000(0000) knlGS:0000000000000000
      [   21.113612] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   21.114405] CR2: 000000000065c000 CR3: 000000005e58e006 CR4: 00000000001606e0
      [   21.115367] Call Trace:
      [   21.115705]  ? __alloc_pages_slowpath+0x21c0/0x21c0
      [   21.116362]  alloc_pages_current+0xac/0x1e0
      [   21.116923]  kmalloc_order+0x18/0x70
      [   21.117393]  kmalloc_order_trace+0x18/0x110
      [   21.117949]  packet_set_ring+0x9d5/0x1770
      [   21.118524]  ? packet_rcv_spkt+0x440/0x440
      [   21.119094]  ? lock_downgrade+0x620/0x620
      [   21.119646]  ? __might_fault+0x177/0x1b0
      [   21.120177]  packet_setsockopt+0x981/0x2940
      [   21.120753]  ? __fget+0x2fb/0x4b0
      [   21.121209]  ? packet_release+0xab0/0xab0
      [   21.121740]  ? sock_has_perm+0x1cd/0x260
      [   21.122297]  ? selinux_secmark_relabel_packet+0xd0/0xd0
      [   21.123013]  ? __fget+0x324/0x4b0
      [   21.123451]  ? selinux_netlbl_socket_setsockopt+0x101/0x320
      [   21.124186]  ? selinux_netlbl_sock_rcv_skb+0x3a0/0x3a0
      [   21.124908]  ? __lock_acquire+0x529/0x3200
      [   21.125453]  ? selinux_socket_setsockopt+0x5d/0x70
      [   21.126075]  ? __sys_setsockopt+0x131/0x210
      [   21.126533]  ? packet_release+0xab0/0xab0
      [   21.127004]  __sys_setsockopt+0x131/0x210
      [   21.127449]  ? kernel_accept+0x2f0/0x2f0
      [   21.127911]  ? ret_from_fork+0x8/0x50
      [   21.128313]  ? do_raw_spin_lock+0x11b/0x280
      [   21.128800]  __x64_sys_setsockopt+0xba/0x150
      [   21.129271]  ? lockdep_hardirqs_on+0x37f/0x560
      [   21.129769]  do_syscall_64+0x9f/0x450
      [   21.130182]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      We should allocate with __GFP_NOWARN to handle this.
      
      Cc: Kal Conley <kal.conley@dectris.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Fixes: fc62814d ("net/packet: fix 4gb buffer limit due to overflow check")
      Signed-off-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ca86ad4
  2. 12 Feb, 2019 1 commit
  3. 17 Jan, 2019 1 commit
    • Nicolas Dichtel's avatar
      af_packet: fix raw sockets over 6in4 tunnel · 88a8121d
      Nicolas Dichtel authored
      Since commit cb9f1b78, scapy (which uses an AF_PACKET socket in
      SOCK_RAW mode) is unable to send a basic icmp packet over a sit tunnel:
      
      Here is a example of the setup:
      $ ip link set ntfp2 up
      $ ip addr add 10.125.0.1/24 dev ntfp2
      $ ip tunnel add tun1 mode sit ttl 64 local 10.125.0.1 remote 10.125.0.2 dev ntfp2
      $ ip addr add fd00:cafe:cafe::1/128 dev tun1
      $ ip link set dev tun1 up
      $ ip route add fd00:200::/64 dev tun1
      $ scapy
      >>> p = []
      >>> p += IPv6(src='fd00:100::1', dst='fd00:200::1')/ICMPv6EchoRequest()
      >>> send(p, count=1, inter=0.1)
      >>> quit()
      $ ip -s link ls dev tun1 | grep -A1 "TX.*errors"
          TX: bytes  packets  errors  dropped carrier collsns
          0          0        1       0       0       0
      
      The problem is that the network offset is set to the hard_header_len of the
      output device (tun1, ie 14 + 20) and in our case, because the packet is
      small (48 bytes) the pskb_inet_may_pull() fails (it tries to pull 40 bytes
      (ipv6 header) starting from the network offset).
      
      This problem is more generally related to device with variable hard header
      length. To avoid a too intrusive patch in the current release, a (ugly)
      workaround is proposed in this patch. It has to be cleaned up in net-next.
      
      Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=993675a3100b1
      Link: http://patchwork.ozlabs.org/patch/1024489/
      Fixes: cb9f1b78 ("ip: validate header length on virtual device xmit")
      CC: Willem de Bruijn <willemb@google.com>
      CC: Maxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88a8121d
  4. 09 Jan, 2019 1 commit
  5. 22 Dec, 2018 1 commit
  6. 21 Dec, 2018 1 commit
  7. 18 Dec, 2018 1 commit
  8. 23 Nov, 2018 1 commit
    • Willem de Bruijn's avatar
      packet: copy user buffers before orphan or clone · 5cd8d46e
      Willem de Bruijn authored
      tpacket_snd sends packets with user pages linked into skb frags. It
      notifies that pages can be reused when the skb is released by setting
      skb->destructor to tpacket_destruct_skb.
      
      This can cause data corruption if the skb is orphaned (e.g., on
      transmit through veth) or cloned (e.g., on mirror to another psock).
      
      Create a kernel-private copy of data in these cases, same as tun/tap
      zerocopy transmission. Reuse that infrastructure: mark the skb as
      SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).
      
      Unlike other zerocopy packets, do not set shinfo destructor_arg to
      struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
      when the original skb is released and a timestamp is recorded. Do not
      change this timestamp behavior. The ubuf_info->callback is not needed
      anyway, as no zerocopy notification is expected.
      
      Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
      resulting value is not a valid ubuf_info pointer, nor a valid
      tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.
      
      The fix relies on features introduced in commit 52267790 ("sock:
      add MSG_ZEROCOPY"), so can be backported as is only to 4.14.
      
      Tested with from `./in_netns.sh ./txring_overwrite` from
      http://github.com/wdebruij/kerneltools/tests
      
      Fixes: 69e3c75f ("net: TX_RING and packet mmap")
      Reported-by: default avatarAnand H. Krishnan <anandhkrishnan@gmail.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cd8d46e
  9. 05 Oct, 2018 1 commit
  10. 06 Sep, 2018 1 commit
    • Vincent Whitchurch's avatar
      packet: add sockopt to ignore outgoing packets · fa788d98
      Vincent Whitchurch authored
      Currently, the only way to ignore outgoing packets on a packet socket is
      via the BPF filter.  With MSG_ZEROCOPY, packets that are looped into
      AF_PACKET are copied in dev_queue_xmit_nit(), and this copy happens even
      if the filter run from packet_rcv() would reject them.  So the presence
      of a packet socket on the interface takes away the benefits of
      MSG_ZEROCOPY, even if the packet socket is not interested in outgoing
      packets.  (Even when MSG_ZEROCOPY is not used, the skb is unnecessarily
      cloned, but the cost for that is much lower.)
      
      Add a socket option to allow AF_PACKET sockets to ignore outgoing
      packets to solve this.  Note that the *BSDs already have something
      similar: BIOCSSEESENT/BIOCSDIRECTION and BIOCSDIRFILT.
      
      The first intended user is lldpd.
      Signed-off-by: default avatarVincent Whitchurch <vincent.whitchurch@axis.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa788d98
  11. 01 Sep, 2018 1 commit
  12. 13 Aug, 2018 1 commit
  13. 06 Aug, 2018 1 commit
    • Willem de Bruijn's avatar
      packet: refine ring v3 block size test to hold one frame · 4576cd46
      Willem de Bruijn authored
      TPACKET_V3 stores variable length frames in fixed length blocks.
      Blocks must be able to store a block header, optional private space
      and at least one minimum sized frame.
      
      Frames, even for a zero snaplen packet, store metadata headers and
      optional reserved space.
      
      In the block size bounds check, ensure that the frame of the
      chosen configuration fits. This includes sockaddr_ll and optional
      tp_reserve.
      
      Syzbot was able to construct a ring with insuffient room for the
      sockaddr_ll in the header of a zero-length frame, triggering an
      out-of-bounds write in dev_parse_header.
      
      Convert the comparison to less than, as zero is a valid snap len.
      This matches the test for minimum tp_frame_size immediately below.
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Fixes: eb73190f ("net/packet: refine check for priv area size")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4576cd46
  14. 04 Aug, 2018 1 commit
  15. 12 Jul, 2018 1 commit
  16. 09 Jul, 2018 2 commits
  17. 07 Jul, 2018 1 commit
  18. 04 Jul, 2018 1 commit
  19. 28 Jun, 2018 1 commit
    • Linus Torvalds's avatar
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds authored
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  20. 21 Jun, 2018 1 commit
    • Eric Dumazet's avatar
      net/packet: fix use-after-free · 945d015e
      Eric Dumazet authored
      We should put copy_skb in receive_queue only after
      a successful call to virtio_net_hdr_from_skb().
      
      syzbot report :
      
      BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline]
      BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline]
      BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
      Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553
      
      CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       __skb_unlink include/linux/skbuff.h:1843 [inline]
       __skb_dequeue include/linux/skbuff.h:1863 [inline]
       skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
       skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852
       packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331
       packet_release+0x630/0xd90 net/packet/af_packet.c:2991
       __sock_release+0xd7/0x260 net/socket.c:603
       sock_close+0x19/0x20 net/socket.c:1186
       __fput+0x35b/0x8b0 fs/file_table.c:209
       ____fput+0x15/0x20 fs/file_table.c:243
       task_work_run+0x1ec/0x2a0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x1b08/0x2750 kernel/exit.c:865
       do_group_exit+0x177/0x440 kernel/exit.c:968
       __do_sys_exit_group kernel/exit.c:979 [inline]
       __se_sys_exit_group kernel/exit.c:977 [inline]
       __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4448e9
      Code: Bad RIP value.
      RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9
      RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001
      RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000
      R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0
      R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000
      
      Allocated by task 4553:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
       skb_clone+0x1f5/0x500 net/core/skbuff.c:1282
       tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221
       deliver_skb net/core/dev.c:1925 [inline]
       deliver_ptype_list_skb net/core/dev.c:1940 [inline]
       __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
       netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
       netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
       tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
       tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
       tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
       call_write_iter include/linux/fs.h:1795 [inline]
       new_sync_write fs/read_write.c:474 [inline]
       __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
       vfs_write+0x1f8/0x560 fs/read_write.c:549
       ksys_write+0x101/0x260 fs/read_write.c:598
       __do_sys_write fs/read_write.c:610 [inline]
       __se_sys_write fs/read_write.c:607 [inline]
       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 4553:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
       kfree_skbmem+0x154/0x230 net/core/skbuff.c:582
       __kfree_skb net/core/skbuff.c:642 [inline]
       kfree_skb+0x1a5/0x580 net/core/skbuff.c:659
       tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385
       deliver_skb net/core/dev.c:1925 [inline]
       deliver_ptype_list_skb net/core/dev.c:1940 [inline]
       __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
       netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
       netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
       tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
       tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
       tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
       call_write_iter include/linux/fs.h:1795 [inline]
       new_sync_write fs/read_write.c:474 [inline]
       __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
       vfs_write+0x1f8/0x560 fs/read_write.c:549
       ksys_write+0x101/0x260 fs/read_write.c:598
       __do_sys_write fs/read_write.c:610 [inline]
       __se_sys_write fs/read_write.c:607 [inline]
       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8801b044ecc0
       which belongs to the cache skbuff_head_cache of size 232
      The buggy address is located 0 bytes inside of
       232-byte region [ffff8801b044ecc0, ffff8801b044eda8)
      The buggy address belongs to the page:
      page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0
      raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
      >ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                 ^
       ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
      
      Fixes: 58d19b19 ("packet: vnet_hdr support for tpacket_rcv")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      945d015e
  21. 12 Jun, 2018 1 commit
    • Kees Cook's avatar
      treewide: Use array_size() in vzalloc() · fad953ce
      Kees Cook authored
      The vzalloc() function has no 2-factor argument form, so multiplication
      factors need to be wrapped in array_size(). This patch replaces cases of:
      
              vzalloc(a * b)
      
      with:
              vzalloc(array_size(a, b))
      
      as well as handling cases of:
      
              vzalloc(a * b * c)
      
      with:
      
              vzalloc(array3_size(a, b, c))
      
      This does, however, attempt to ignore constant size factors like:
      
              vzalloc(4 * 1024)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        vzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        vzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        vzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
        vzalloc(
      -	SIZE * COUNT
      +	array_size(COUNT, SIZE)
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        vzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        vzalloc(C1 * C2 * C3, ...)
      |
        vzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants.
      @@
      expression E1, E2;
      constant C1, C2;
      @@
      
      (
        vzalloc(C1 * C2, ...)
      |
        vzalloc(
      -	E1 * E2
      +	array_size(E1, E2)
        , ...)
      )
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      fad953ce
  22. 07 Jun, 2018 1 commit
    • Willem de Bruijn's avatar
      net: in virtio_net_hdr only add VLAN_HLEN to csum_start if payload holds vlan · fd3a8862
      Willem de Bruijn authored
      Tun, tap, virtio, packet and uml vector all use struct virtio_net_hdr
      to communicate packet metadata to userspace.
      
      For skbuffs with vlan, the first two return the packet as it may have
      existed on the wire, inserting the VLAN tag in the user buffer.  Then
      virtio_net_hdr.csum_start needs to be adjusted by VLAN_HLEN bytes.
      
      Commit f09e2249 ("macvtap: restore vlan header on user read")
      added this feature to macvtap. Commit 3ce9b20f ("macvtap: Fix
      csum_start when VLAN tags are present") then fixed up csum_start.
      
      Virtio, packet and uml do not insert the vlan header in the user
      buffer.
      
      When introducing virtio_net_hdr_from_skb to deduplicate filling in
      the virtio_net_hdr, the variant from macvtap which adds VLAN_HLEN was
      applied uniformly, breaking csum offset for packets with vlan on
      virtio and packet.
      
      Make insertion of VLAN_HLEN optional. Convert the callers to pass it
      when needed.
      
      Fixes: e858fae2 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
      Fixes: 1276f24e ("packet: use common code for virtio_net_hdr and skb GSO conversion")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd3a8862
  23. 04 Jun, 2018 1 commit
    • Eric Dumazet's avatar
      net/packet: refine check for priv area size · eb73190f
      Eric Dumazet authored
      syzbot was able to trick af_packet again [1]
      
      Various commits tried to address the problem in the past,
      but failed to take into account V3 header size.
      
      [1]
      
      tpacket_rcv: packet too big, clamped from 72 to 4294967224. macoff=96
      BUG: KASAN: use-after-free in prb_run_all_ft_ops net/packet/af_packet.c:1016 [inline]
      BUG: KASAN: use-after-free in prb_fill_curr_block.isra.59+0x4e5/0x5c0 net/packet/af_packet.c:1039
      Write of size 2 at addr ffff8801cb62000e by task kworker/1:2/2106
      
      CPU: 1 PID: 2106 Comm: kworker/1:2 Not tainted 4.17.0-rc7+ #77
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: ipv6_addrconf addrconf_dad_work
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_store2_noabort+0x17/0x20 mm/kasan/report.c:436
       prb_run_all_ft_ops net/packet/af_packet.c:1016 [inline]
       prb_fill_curr_block.isra.59+0x4e5/0x5c0 net/packet/af_packet.c:1039
       __packet_lookup_frame_in_block net/packet/af_packet.c:1094 [inline]
       packet_current_rx_frame net/packet/af_packet.c:1117 [inline]
       tpacket_rcv+0x1866/0x3340 net/packet/af_packet.c:2282
       dev_queue_xmit_nit+0x891/0xb90 net/core/dev.c:2018
       xmit_one net/core/dev.c:3049 [inline]
       dev_hard_start_xmit+0x16b/0xc10 net/core/dev.c:3069
       __dev_queue_xmit+0x2724/0x34c0 net/core/dev.c:3584
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3617
       neigh_resolve_output+0x679/0xad0 net/core/neighbour.c:1358
       neigh_output include/net/neighbour.h:482 [inline]
       ip6_finish_output2+0xc9c/0x2810 net/ipv6/ip6_output.c:120
       ip6_finish_output+0x5fe/0xbc0 net/ipv6/ip6_output.c:154
       NF_HOOK_COND include/linux/netfilter.h:277 [inline]
       ip6_output+0x227/0x9b0 net/ipv6/ip6_output.c:171
       dst_output include/net/dst.h:444 [inline]
       NF_HOOK include/linux/netfilter.h:288 [inline]
       ndisc_send_skb+0x100d/0x1570 net/ipv6/ndisc.c:491
       ndisc_send_ns+0x3c1/0x8d0 net/ipv6/ndisc.c:633
       addrconf_dad_work+0xbef/0x1340 net/ipv6/addrconf.c:4033
       process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
       worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
       kthread+0x345/0x410 kernel/kthread.c:240
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      
      The buggy address belongs to the page:
      page:ffffea00072d8800 count:0 mapcount:-127 mapping:0000000000000000 index:0xffff8801cb620e80
      flags: 0x2fffc0000000000()
      raw: 02fffc0000000000 0000000000000000 ffff8801cb620e80 00000000ffffff80
      raw: ffffea00072e3820 ffffea0007132d20 0000000000000002 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801cb61ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8801cb61ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff8801cb620000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                            ^
       ffff8801cb620080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8801cb620100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      
      Fixes: 2b6867c2 ("net/packet: fix overflow in check for priv area size")
      Fixes: dc808110 ("packet: handle too big packets for PACKET_V3")
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb73190f
  24. 26 May, 2018 1 commit
  25. 25 May, 2018 1 commit
    • Willem de Bruijn's avatar
      packet: fix reserve calculation · 9aad13b0
      Willem de Bruijn authored
      Commit b84bbaf7 ("packet: in packet_snd start writing at link
      layer allocation") ensures that packet_snd always starts writing
      the link layer header in reserved headroom allocated for this
      purpose.
      
      This is needed because packets may be shorter than hard_header_len,
      in which case the space up to hard_header_len may be zeroed. But
      that necessary padding is not accounted for in skb->len.
      
      The fix, however, is buggy. It calls skb_push, which grows skb->len
      when moving skb->data back. But in this case packet length should not
      change.
      
      Instead, call skb_reserve, which moves both skb->data and skb->tail
      back, without changing length.
      
      Fixes: b84bbaf7 ("packet: in packet_snd start writing at link layer allocation")
      Reported-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9aad13b0
  26. 16 May, 2018 1 commit
  27. 14 May, 2018 1 commit
    • Willem de Bruijn's avatar
      packet: in packet_snd start writing at link layer allocation · b84bbaf7
      Willem de Bruijn authored
      Packet sockets allow construction of packets shorter than
      dev->hard_header_len to accommodate protocols with variable length
      link layer headers. These packets are padded to dev->hard_header_len,
      because some device drivers interpret that as a minimum packet size.
      
      packet_snd reserves dev->hard_header_len bytes on allocation.
      SOCK_DGRAM sockets call skb_push in dev_hard_header() to ensure that
      link layer headers are stored in the reserved range. SOCK_RAW sockets
      do the same in tpacket_snd, but not in packet_snd.
      
      Syzbot was able to send a zero byte packet to a device with massive
      116B link layer header, causing padding to cross over into skb_shinfo.
      Fix this by writing from the start of the llheader reserved range also
      in the case of packet_snd/SOCK_RAW.
      
      Update skb_set_network_header to the new offset. This also corrects
      it for SOCK_DGRAM, where it incorrectly double counted reserve due to
      the skb_push in dev_hard_header.
      
      Fixes: 9ed988cd ("packet: validate variable length ll headers")
      Reported-by: syzbot+71d74a5406d02057d559@syzkaller.appspotmail.com
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b84bbaf7
  28. 03 May, 2018 1 commit
  29. 24 Apr, 2018 1 commit
  30. 16 Apr, 2018 1 commit
    • Eric Dumazet's avatar
      net: af_packet: fix race in PACKET_{R|T}X_RING · 5171b37d
      Eric Dumazet authored
      In order to remove the race caught by syzbot [1], we need
      to lock the socket before using po->tp_version as this could
      change under us otherwise.
      
      This means lock_sock() and release_sock() must be done by
      packet_set_ring() callers.
      
      [1] :
      BUG: KMSAN: uninit-value in packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
      CPU: 0 PID: 20195 Comm: syzkaller707632 Not tainted 4.16.0+ #83
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
       packet_setsockopt+0x12c6/0x5a90 net/packet/af_packet.c:3662
       SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849
       SyS_setsockopt+0x76/0xa0 net/socket.c:1828
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x449099
      RSP: 002b:00007f42b5307ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 000000000070003c RCX: 0000000000449099
      RDX: 0000000000000005 RSI: 0000000000000107 RDI: 0000000000000003
      RBP: 0000000000700038 R08: 000000000000001c R09: 0000000000000000
      R10: 00000000200000c0 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000080eecf R14: 00007f42b53089c0 R15: 0000000000000001
      
      Local variable description: ----req_u@packet_setsockopt
      Variable was created at:
       packet_setsockopt+0x13f/0x5a90 net/packet/af_packet.c:3612
       SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5171b37d
  31. 27 Mar, 2018 1 commit
  32. 13 Feb, 2018 1 commit
  33. 12 Feb, 2018 1 commit
    • Denys Vlasenko's avatar
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko authored
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: default avatarDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  34. 11 Feb, 2018 1 commit
    • Linus Torvalds's avatar
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds authored
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  35. 16 Jan, 2018 1 commit
    • Alexey Dobriyan's avatar
      net: delete /proc THIS_MODULE references · 96890d62
      Alexey Dobriyan authored
      /proc has been ignoring struct file_operations::owner field for 10 years.
      Specifically, it started with commit 786d7e16
      ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
      inode->i_fop is initialized with proxy struct file_operations for
      regular files:
      
      	-               if (de->proc_fops)
      	-                       inode->i_fop = de->proc_fops;
      	+               if (de->proc_fops) {
      	+                       if (S_ISREG(inode->i_mode))
      	+                               inode->i_fop = &proc_reg_file_ops;
      	+                       else
      	+                               inode->i_fop = de->proc_fops;
      	+               }
      
      VFS stopped pinning module at this point.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96890d62
  36. 20 Dec, 2017 1 commit
  37. 28 Nov, 2017 2 commits
    • Eric Dumazet's avatar
      net/packet: fix a race in packet_bind() and packet_notifier() · 15fe076e
      Eric Dumazet authored
      syzbot reported crashes [1] and provided a C repro easing bug hunting.
      
      When/if packet_do_bind() calls __unregister_prot_hook() and releases
      po->bind_lock, another thread can run packet_notifier() and process an
      NETDEV_UP event.
      
      This calls register_prot_hook() and hooks again the socket right before
      first thread is able to grab again po->bind_lock.
      
      Fixes this issue by temporarily setting po->num to 0, as suggested by
      David Miller.
      
      [1]
      dev_remove_pack: ffff8801bf16fa80 not found
      ------------[ cut here ]------------
      kernel BUG at net/core/dev.c:7945!  ( BUG_ON(!list_empty(&dev->ptype_all)); )
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      device syz0 entered promiscuous mode
      CPU: 0 PID: 3161 Comm: syzkaller404108 Not tainted 4.14.0+ #190
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801cc57a500 task.stack: ffff8801cc588000
      RIP: 0010:netdev_run_todo+0x772/0xae0 net/core/dev.c:7945
      RSP: 0018:ffff8801cc58f598 EFLAGS: 00010293
      RAX: ffff8801cc57a500 RBX: dffffc0000000000 RCX: ffffffff841f75b2
      RDX: 0000000000000000 RSI: 1ffff100398b1ede RDI: ffff8801bf1f8810
      device syz0 entered promiscuous mode
      RBP: ffff8801cc58f898 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bf1f8cd8
      R13: ffff8801cc58f870 R14: ffff8801bf1f8780 R15: ffff8801cc58f7f0
      FS:  0000000001716880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020b13000 CR3: 0000000005e25000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:106
       tun_detach drivers/net/tun.c:670 [inline]
       tun_chr_close+0x49/0x60 drivers/net/tun.c:2845
       __fput+0x333/0x7f0 fs/file_table.c:210
       ____fput+0x15/0x20 fs/file_table.c:244
       task_work_run+0x199/0x270 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x9bb/0x1ae0 kernel/exit.c:865
       do_group_exit+0x149/0x400 kernel/exit.c:968
       SYSC_exit_group kernel/exit.c:979 [inline]
       SyS_exit_group+0x1d/0x20 kernel/exit.c:977
       entry_SYSCALL_64_fastpath+0x1f/0x96
      RIP: 0033:0x44ad19
      
      Fixes: 30f7ea1c ("packet: race condition in packet_bind")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Francesco Ruggeri <fruggeri@aristanetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15fe076e
    • Mike Maloney's avatar
      packet: fix crash in fanout_demux_rollover() · 57f015f5
      Mike Maloney authored
      syzkaller found a race condition fanout_demux_rollover() while removing
      a packet socket from a fanout group.
      
      po->rollover is read and operated on during packet_rcv_fanout(), via
      fanout_demux_rollover(), but the pointer is currently cleared before the
      synchronization in packet_release().   It is safer to delay the cleanup
      until after synchronize_net() has been called, ensuring all calls to
      packet_rcv_fanout() for this socket have finished.
      
      To further simplify synchronization around the rollover structure, set
      po->rollover in fanout_add() only if there are no errors.  This removes
      the need for rcu in the struct and in the call to
      packet_getsockopt(..., PACKET_ROLLOVER_STATS, ...).
      
      Crashing stack trace:
       fanout_demux_rollover+0xb6/0x4d0 net/packet/af_packet.c:1392
       packet_rcv_fanout+0x649/0x7c8 net/packet/af_packet.c:1487
       dev_queue_xmit_nit+0x835/0xc10 net/core/dev.c:1953
       xmit_one net/core/dev.c:2975 [inline]
       dev_hard_start_xmit+0x16b/0xac0 net/core/dev.c:2995
       __dev_queue_xmit+0x17a4/0x2050 net/core/dev.c:3476
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3509
       neigh_connected_output+0x489/0x720 net/core/neighbour.c:1379
       neigh_output include/net/neighbour.h:482 [inline]
       ip6_finish_output2+0xad1/0x22a0 net/ipv6/ip6_output.c:120
       ip6_finish_output+0x2f9/0x920 net/ipv6/ip6_output.c:146
       NF_HOOK_COND include/linux/netfilter.h:239 [inline]
       ip6_output+0x1f4/0x850 net/ipv6/ip6_output.c:163
       dst_output include/net/dst.h:459 [inline]
       NF_HOOK.constprop.35+0xff/0x630 include/linux/netfilter.h:250
       mld_sendpack+0x6a8/0xcc0 net/ipv6/mcast.c:1660
       mld_send_initial_cr.part.24+0x103/0x150 net/ipv6/mcast.c:2072
       mld_send_initial_cr net/ipv6/mcast.c:2056 [inline]
       ipv6_mc_dad_complete+0x99/0x130 net/ipv6/mcast.c:2079
       addrconf_dad_completed+0x595/0x970 net/ipv6/addrconf.c:4039
       addrconf_dad_work+0xac9/0x1160 net/ipv6/addrconf.c:3971
       process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2113
       worker_thread+0x223/0x1990 kernel/workqueue.c:2247
       kthread+0x35e/0x430 kernel/kthread.c:231
       ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:432
      
      Fixes: 0648ab70 ("packet: rollover prepare: per-socket state")
      Fixes: 509c7a1e ("packet: avoid panic in packet_getsockopt()")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarMike Maloney <maloney@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57f015f5