1. 11 Jul, 2012 1 commit
  2. 27 Jun, 2012 1 commit
  3. 15 Apr, 2012 1 commit
    • John Fastabend's avatar
      net: add generic PF_BRIDGE:RTM_ FDB hooks · 77162022
      John Fastabend authored
      This adds two new flags NTF_MASTER and NTF_SELF that can
      now be used to specify where PF_BRIDGE netlink commands should
      be sent. NTF_MASTER sends the commands to the 'dev->master'
      device for parsing. Typically this will be the linux net/bridge,
      or open-vswitch devices. Also without any flags set the command
      will be handled by the master device as well so that current user
      space tools continue to work as expected.
      The NTF_SELF flag will push the PF_BRIDGE commands to the
      device. In the basic example below the commands are then parsed
      and programmed in the embedded bridge.
      Note if both NTF_SELF and NTF_MASTER bits are set then the
      command will be sent to both 'dev->master' and 'dev' this allows
      user space to easily keep the embedded bridge and software bridge
      in sync.
      There is a slight complication in the case with both flags set
      when an error occurs. To resolve this the rtnl handler clears
      the NTF_ flag in the netlink ack to indicate which sets completed
      successfully. The add/del handlers will abort as soon as any
      error occurs.
      To support this new net device ops were added to call into
      the device and the existing bridging code was refactored
      to use these. There should be no required changes in user space
      to support the current bridge behavior.
      A basic setup with a SR-IOV enabled NIC looks like this,
                veth0  veth2
                  |      |
                |  bridge0 |   <---- software bridging
        ethx.y      ethx
          VF         PF
           \         \          <---- propagate FDB entries to HW
           \         \
        |  Embedded Bridge |    <---- hardware offloaded switching
      In this case the embedded bridge must be managed to allow 'veth0'
      to communicate with 'ethx.y' correctly. At present drivers managing
      the embedded bridge either send frames onto the network which
      then get dropped by the switch OR the embedded bridge will flood
      these frames. With this patch we have a mechanism to manage the
      embedded bridge correctly from user space. This example is specific
      to SR-IOV but replacing the VF with another PF or dropping this
      into the DSA framework generates similar management issues.
      Examples session using the 'br'[1] tool to add, dump and then
      delete a mac address with a new "embedded" option and enabled
      ixgbe driver:
      # br fdb add 22:35:19:ac:60:59 dev eth3
      # br fdb
      port    mac addr                flags
      veth0   22:35:19:ac:60:58       static
      veth0   9a:5f:81:f7:f6:ec       local
      eth3    00:1b:21:55:23:59       local
      eth3    22:35:19:ac:60:59       static
      veth0   22:35:19:ac:60:57       static
      #br fdb add 22:35:19:ac:60:59 embedded dev eth3
      #br fdb
      port    mac addr                flags
      veth0   22:35:19:ac:60:58       static
      veth0   9a:5f:81:f7:f6:ec       local
      eth3    00:1b:21:55:23:59       local
      eth3    22:35:19:ac:60:59       static
      veth0   22:35:19:ac:60:57       static
      eth3    22:35:19:ac:60:59       local embedded
      #br fdb del 22:35:19:ac:60:59 embedded dev eth3
      I added a couple lines to 'br' to set the flags correctly is all. It
      is my opinion that the merit of this patch is now embedded and SW
      bridges can both be modeled correctly in user space using very nearly
      the same message passing.
      [1] 'br' tool was published as an RFC here and will be renamed 'bridge'
      Thanks to Jamal Hadi Salim, Stephen Hemminger and Ben Hutchings for
      valuable feedback, suggestions, and review.
      v2: fixed api descriptions and error case with both NTF_SELF and
          NTF_MASTER set plus updated patch description.
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  4. 21 Feb, 2012 1 commit
    • Greg Rose's avatar
      rtnetlink: Fix problem with buffer allocation · 115c9b81
      Greg Rose authored
      Implement a new netlink attribute type IFLA_EXT_MASK.  The mask
      is a 32 bit value that can be used to indicate to the kernel that
      certain extended ifinfo values are requested by the user application.
      At this time the only mask value defined is RTEXT_FILTER_VF to
      indicate that the user wants the ifinfo dump to send information
      about the VFs belonging to the interface.
      This patch fixes a bug in which certain applications do not have
      large enough buffers to accommodate the extra information returned
      by the kernel with large numbers of SR-IOV virtual functions.
      Those applications will not send the new netlink attribute with
      the interface info dump request netlink messages so they will
      not get unexpectedly large request buffers returned by the kernel.
      Modifies the rtnl_calcit function to traverse the list of net
      devices and compute the minimum buffer size that can hold the
      info dumps of all matching devices based upon the filter passed
      in via the new netlink attribute filter mask.  If no filter
      mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.
      With this change it is possible to add yet to be defined netlink
      attributes to the dump request which should make it fairly extensible
      in the future.
      Signed-off-by: default avatarGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  5. 08 Jul, 2011 1 commit
  6. 21 Jun, 2011 1 commit
    • John Fastabend's avatar
      net: dcbnl, add multicast group for DCB · 314b4778
      John Fastabend authored
      Now that dcbnl is being used in many cases by more
      than a single agent it is beneficial to be notified
      when some entity either driver or user space has
      changed the DCB attributes.
      Today applications either end up polling the interface
      or relying on a user space database to maintain the DCB
      state and post events. Polling is a poor solution for
      obvious reasons. And relying on a user space database
      has its own downside. Namely it has created strange
      boot dependencies requiring the database be populated
      before any applications dependent on DCB attributes
      starts or the application goes into a polling loop.
      Populating the database requires negotiating link
      setting with the peer and can take anywhere from less
      than a second up to a few seconds depending on the switch
      Perhaps more importantly if another application or an
      embedded agent sets a DCB link attribute the database
      has no way of knowing other than polling the kernel.
      This prevents applications from responding quickly to
      changes in link events which at least in the FCoE case
      and probably any other protocols expecting a lossless
      link may result in IO errors.
      By adding a multicast group for DCB we have clean way
      to disseminate kernel DCB link attributes up to user
      space. Avoiding the need for user space to maintain
      a coherant database and disperse events that potentially
      do not reflect the current link state.
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  7. 15 Nov, 2010 1 commit
    • Andy Whitcroft's avatar
      net: rtnetlink.h -- only include linux/netdevice.h when used by the kernel · 3b42a96d
      Andy Whitcroft authored
      The commit below added a new helper dev_ingress_queue to cleanly obtain the
      ingress queue pointer.  This necessitated including 'linux/netdevice.h':
        commit 24824a09
        Author: Eric Dumazet <eric.dumazet@gmail.com>
        Date:   Sat Oct 2 06:11:55 2010 +0000
          net: dynamic ingress_queue allocation
      However this include triggers issues for applications in userspace
      which use the rtnetlink interfaces.  Commonly this requires they include
      'net/if.h' and 'linux/rtnetlink.h' leading to a compiler error as below:
        In file included from /usr/include/linux/netdevice.h:28:0,
                         from /usr/include/linux/rtnetlink.h:9,
                         from t.c:2:
        /usr/include/linux/if.h:135:8: error: redefinition of ‘struct ifmap’
        /usr/include/net/if.h:112:8: note: originally defined here
        /usr/include/linux/if.h:169:8: error: redefinition of ‘struct ifreq’
        /usr/include/net/if.h:127:8: note: originally defined here
        /usr/include/linux/if.h:218:8: error: redefinition of ‘struct ifconf’
        /usr/include/net/if.h:177:8: note: originally defined here
      The new helper is only defined for the kernel and protected by __KERNEL__
      therefore we can simply pull the include down into the same protected
      Signed-off-by: default avatarAndy Whitcroft <apw@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  8. 05 Oct, 2010 2 commits
  9. 16 Sep, 2010 1 commit
  10. 08 Sep, 2010 1 commit
  11. 22 Jul, 2010 1 commit
  12. 11 May, 2010 1 commit
    • Patrick McHardy's avatar
      ipv6: ip6mr: support multiple tables · d1db275d
      Patrick McHardy authored
      This patch adds support for multiple independant multicast routing instances,
      named "tables".
      Userspace multicast routing daemons can bind to a specific table instance by
      issuing a setsockopt call using a new option MRT6_TABLE. The table number is
      stored in the raw socket data and affects all following ip6mr setsockopt(),
      getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT)
      is created with a default routing rule pointing to it. Newly created pim6reg
      devices have the table number appended ("pim6regX"), with the exception of
      devices created in the default table, which are named just "pim6reg" for
      compatibility reasons.
      Packets are directed to a specific table instance using routing rules,
      similar to how regular routing rules work. Currently iif, oif and mark
      are supported as keys, source and destination addresses could be supported
      Example usage:
      - bind pimd/xorp/... to a specific table:
      uint32_t table = 123;
      setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table));
      - create routing rules directing packets to the new table:
      # ip -6 mrule add iif eth0 lookup 123
      # ip -6 mrule add oif eth0 lookup 123
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
  13. 26 Apr, 2010 1 commit
    • Patrick McHardy's avatar
      net: rtnetlink: decouple rtnetlink address families from real address families · 25239cee
      Patrick McHardy authored
      Decouple rtnetlink address families from real address families in socket.h to
      be able to add rtnetlink interfaces to code that is not a real address family
      without increasing AF_MAX/NPROTO.
      This will be used to add support for multicast route dumping from all tables
      as the proc interface can't be extended to support anything but the main table
      without breaking compatibility.
      This partialy undoes the patch to introduce independant families for routing
      rules and converts ipmr routing rules to a new rtnetlink family. Similar to
      that patch, values up to 127 are reserved for real address families, values
      above that may be used arbitrarily.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
  14. 25 Feb, 2010 1 commit
    • Paul E. McKenney's avatar
      net: Add checking to rcu_dereference() primitives · a898def2
      Paul E. McKenney authored
      Update rcu_dereference() primitives to use new lockdep-based
      checking. The rcu_dereference() in __in6_dev_get() may be
      protected either by rcu_read_lock() or RTNL, per Eric Dumazet.
      The rcu_dereference() in __sk_free() is protected by the fact
      that it is never reached if an update could change it.  Check
      for this by using rcu_dereference_check() to verify that the
      struct sock's ->sk_wmem_alloc counter is zero.
      Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-5-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  15. 23 Dec, 2009 1 commit
    • laurent chavey's avatar
      net: Add rtnetlink init_rcvwnd to set the TCP initial receive window · 31d12926
      laurent chavey authored
      Add rtnetlink init_rcvwnd to set the TCP initial receive window size
      advertised by passive and active TCP connections.
      The current Linux TCP implementation limits the advertised TCP initial
      receive window to the one prescribed by slow start. For short lived
      TCP connections used for transaction type of traffic (i.e. http
      requests), bounding the advertised TCP initial receive window results
      in increased latency to complete the transaction.
      Support for setting initial congestion window is already supported
      using rtnetlink init_cwnd, but the feature is useless without the
      ability to set a larger TCP initial receive window.
      The rtnetlink init_rcvwnd allows increasing the TCP initial receive
      window, allowing TCP connection to advertise larger TCP receive window
      than the ones bounded by slow start.
      Signed-off-by: default avatarLaurent Chavey <chavey@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  16. 16 Dec, 2009 1 commit
    • David S. Miller's avatar
      tcp: Revert per-route SACK/DSACK/TIMESTAMP changes. · bb5b7c11
      David S. Miller authored
      It creates a regression, triggering badness for SYN_RECV
      sockets, for example:
      [19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
      [19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
      [19148.023035] REGS: eeecbd30 TRAP: 0700   Not tainted  (2.6.32)
      [19148.023496] MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002442  XER: 00000000
      [19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000
      This is likely caused by the change in the 'estab' parameter
      passed to tcp_parse_options() when invoked by the functions
      in net/ipv4/tcp_minisocks.c
      But even if that is fixed, the ->conn_request() changes made in
      this patch series is fundamentally wrong.  They try to use the
      listening socket's 'dst' to probe the route settings.  The
      listening socket doesn't even have a route, and you can't
      get the right route (the child request one) until much later
      after we setup all of the state, and it must be done by hand.
      This stuff really isn't ready, so the best thing to do is a
      full revert.  This reverts the following commits:
      6a2a2d6bSigned-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  17. 04 Nov, 2009 1 commit
  18. 29 Oct, 2009 4 commits
  19. 09 Sep, 2009 1 commit
  20. 20 Mar, 2009 1 commit
  21. 25 Feb, 2009 1 commit
    • Pablo Neira Ayuso's avatar
      netlink: change nlmsg_notify() return value logic · 1ce85fe4
      Pablo Neira Ayuso authored
      This patch changes the return value of nlmsg_notify() as follows:
      If NETLINK_BROADCAST_ERROR is set by any of the listeners and
      an error in the delivery happened, return the broadcast error;
      else if there are no listeners apart from the socket that
      requested a change with the echo flag, return the result of the
      unicast notification. Thus, with this patch, the unicast
      notification is handled in the same way of a broadcast listener
      that has set the NETLINK_BROADCAST_ERROR socket flag.
      This patch is useful in case that the caller of nlmsg_notify()
      wants to know the result of the delivery of a netlink notification
      (including the broadcast delivery) and take any action in case
      that the delivery failed. For example, ctnetlink can drop packets
      if the event delivery failed to provide reliable logging and
      state-synchronization at the cost of dropping packets.
      This patch also modifies the rtnetlink code to ignore the return
      value of rtnl_notify() in all callers. The function rtnl_notify()
      (before this patch) returned the error of the unicast notification
      which makes rtnl_set_sk_err() reports errors to all listeners. This
      is not of any help since the origin of the change (the socket that
      requested the echoing) notices the ENOBUFS error if the notification
      fails and should resync itself.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  22. 30 Jan, 2009 1 commit
  23. 21 Nov, 2008 1 commit
  24. 23 Sep, 2008 1 commit
  25. 26 Jul, 2008 1 commit
  26. 20 Jul, 2008 1 commit
  27. 10 Jun, 2008 1 commit
  28. 03 Jun, 2008 1 commit
  29. 24 Apr, 2008 1 commit
    • Patrick McHardy's avatar
      [RTNETLINK]: Fix bogus ASSERT_RTNL warning · c9c1014b
      Patrick McHardy authored
      ASSERT_RTNL uses mutex_trylock to test whether the rtnl_mutex is
      held. This bogus warnings when running in atomic context, which
      f.e. happens when adding secondary unicast addresses through
      macvlan or vlan or when synchronizing multicast addresses from
      wireless devices.
      Mid-term we might want to consider moving all address updates
      to process context since the locking seems overly complicated,
      for now just fix the bogus warning by changing ASSERT_RTNL to
      use mutex_is_locked().
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  30. 05 Feb, 2008 1 commit
  31. 28 Jan, 2008 2 commits
  32. 13 Nov, 2007 1 commit
  33. 11 Oct, 2007 1 commit
    • Pierre Ynard's avatar
      [IPv6]: Export userland ND options through netlink (RDNSS support) · 31910575
      Pierre Ynard authored
      As discussed before, this patch provides userland with a way to access
      relevant options in Router Advertisements, after they are processed
      and validated by the kernel. Extra options are processed in a generic
      way; this patch only exports RDNSS options described in RFC5006, but
      support to control which options are exported could be easily added.
      A new rtnetlink message type is defined, to transport Neighbor
      Discovery options, along with optional context information. At the
      moment only the address of the router sending an RDNSS option is
      included, but additional attributes may be later defined, if needed by
      new use cases.
      Signed-off-by: default avatarPierre Ynard <linkfanel@yahoo.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  34. 31 Aug, 2007 1 commit
  35. 11 Jul, 2007 1 commit