1. 07 Feb, 2018 1 commit
  2. 09 May, 2017 1 commit
  3. 03 Nov, 2016 1 commit
  4. 27 Oct, 2016 3 commits
    • Johannes Berg's avatar
      genetlink: mark families as __ro_after_init · 56989f6d
      Johannes Berg authored
      Now genl_register_family() is the only thing (other than the
      users themselves, perhaps, but I didn't find any doing that)
      writing to the family struct.
      
      In all families that I found, genl_register_family() is only
      called from __init functions (some indirectly, in which case
      I've add __init annotations to clarifly things), so all can
      actually be marked __ro_after_init.
      
      This protects the data structure from accidental corruption.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56989f6d
    • Johannes Berg's avatar
      genetlink: statically initialize families · 489111e5
      Johannes Berg authored
      Instead of providing macros/inline functions to initialize
      the families, make all users initialize them statically and
      get rid of the macros.
      
      This reduces the kernel code size by about 1.6k on x86-64
      (with allyesconfig).
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      489111e5
    • Johannes Berg's avatar
      genetlink: no longer support using static family IDs · a07ea4d9
      Johannes Berg authored
      Static family IDs have never really been used, the only
      use case was the workaround I introduced for those users
      that assumed their family ID was also their multicast
      group ID.
      
      Additionally, because static family IDs would never be
      reserved by the generic netlink code, using a relatively
      low ID would only work for built-in families that can be
      registered immediately after generic netlink is started,
      which is basically only the control family (apart from
      the workaround code, which I also had to add code for so
      it would reserve those IDs)
      
      Thus, anything other than GENL_ID_GENERATE is flawed and
      luckily not used except in the cases I mentioned. Move
      those workarounds into a few lines of code, and then get
      rid of GENL_ID_GENERATE entirely, making it more robust.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a07ea4d9
  5. 24 Apr, 2016 1 commit
  6. 18 Jan, 2015 1 commit
    • Johannes Berg's avatar
      netlink: make nlmsg_end() and genlmsg_end() void · 053c095a
      Johannes Berg authored
      Contrary to common expectations for an "int" return, these functions
      return only a positive value -- if used correctly they cannot even
      return 0 because the message header will necessarily be in the skb.
      
      This makes the very common pattern of
      
        if (genlmsg_end(...) < 0) { ... }
      
      be a whole bunch of dead code. Many places also simply do
      
        return nlmsg_end(...);
      
      and the caller is expected to deal with it.
      
      This also commonly (at least for me) causes errors, because it is very
      common to write
      
        if (my_function(...))
          /* error condition */
      
      and if my_function() does "return nlmsg_end()" this is of course wrong.
      
      Additionally, there's not a single place in the kernel that actually
      needs the message length returned, and if anyone needs it later then
      it'll be very easy to just use skb->len there.
      
      Remove this, and make the functions void. This removes a bunch of dead
      code as described above. The patch adds lines because I did
      
      -	return nlmsg_end(...);
      +	nlmsg_end(...);
      +	return 0;
      
      I could have preserved all the function's return values by returning
      skb->len, but instead I've audited all the places calling the affected
      functions and found that none cared. A few places actually compared
      the return value with <= 0 in dump functionality, but that could just
      be changed to < 0 with no change in behaviour, so I opted for the more
      efficient version.
      
      One instance of the error I've made numerous times now is also present
      in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
      check for <0 or <=0 and thus broke out of the loop every single time.
      I've preserved this since it will (I think) have caused the messages to
      userspace to be formatted differently with just a single message for
      every SKB returned to userspace. It's possible that this isn't needed
      for the tools that actually use this, but I don't even know what they
      are so couldn't test that changing this behaviour would be acceptable.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      053c095a
  7. 19 Nov, 2014 1 commit
  8. 26 Aug, 2014 1 commit
  9. 19 Nov, 2013 1 commit
  10. 14 Nov, 2013 2 commits
  11. 13 Nov, 2013 2 commits
  12. 05 Oct, 2012 1 commit
  13. 27 Sep, 2012 1 commit
  14. 18 Sep, 2012 1 commit
    • Eric W. Biederman's avatar
      userns: Convert taskstats to handle the user and pid namespaces. · 4bd6e32a
      Eric W. Biederman authored
      - Explicitly limit exit task stat broadcast to the initial user and
        pid namespaces, as it is already limited to the initial network
        namespace.
      
      - For broadcast task stats explicitly generate all of the idenitiers
        in terms of the initial user namespace and the initial pid
        namespace.
      
      - For request stats report them in terms of the current user namespace
        and the current pid namespace.  Netlink messages are delivered
        syncrhonously to the kernel allowing us to get the user namespace
        and the pid namespace from the current task.
      
      - Pass the namespaces for representing pids and uids and gids
        into bacct_add_task.
      
      Cc: Balbir Singh <bsingharora@gmail.com>
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      4bd6e32a
  15. 10 Sep, 2012 1 commit
  16. 31 Jul, 2012 1 commit
  17. 20 Sep, 2011 1 commit
  18. 04 Aug, 2011 2 commits
  19. 26 Jul, 2011 1 commit
  20. 28 Jun, 2011 1 commit
    • Vasiliy Kulikov's avatar
      taskstats: don't allow duplicate entries in listener mode · 26c4caea
      Vasiliy Kulikov authored
      Currently a single process may register exit handlers unlimited times.
      It may lead to a bloated listeners chain and very slow process
      terminations.
      
      Eg after 10KK sent TASKSTATS_CMD_ATTR_REGISTER_CPUMASKs ~300 Mb of
      kernel memory is stolen for the handlers chain and "time id" shows 2-7
      seconds instead of normal 0.003.  It makes it possible to exhaust all
      kernel memory and to eat much of CPU time by triggerring numerous exits
      on a single CPU.
      
      The patch limits the number of times a single process may register
      itself on a single CPU to one.
      
      One little issue is kept unfixed - as taskstats_exit() is called before
      exit_files() in do_exit(), the orphaned listener entry (if it was not
      explicitly deregistered) is kept until the next someone's exit() and
      implicit deregistration in send_cpu_listeners().  So, if a process
      registered itself as a listener exits and the next spawned process gets
      the same pid, it would inherit taskstats attributes.
      Signed-off-by: default avatarVasiliy Kulikov <segooon@gmail.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      26c4caea
  21. 24 Mar, 2011 1 commit
  22. 13 Jan, 2011 1 commit
    • Jeff Mahoney's avatar
      taskstats: use better ifdef for alignment · 9ab020cf
      Jeff Mahoney authored
      Commit 4be2c95d ("taskstats: pad taskstats netlink response for aligment
      issues on ia64") added a null field to align the taskstats structure but
      the discussion centered around ia64.  The issue exists on other platforms
      with inefficient unaligned access and adding them piecemeal would be an
      unmaintainable mess.
      
      This patch uses Dave Miller's suggestion of using a combination of
      CONFIG_64BIT && !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to determine
      whether alignment is needed.
      
      Note that this will cause breakage on those platforms with applications
      like iotop which had hard-coded offsets into the packet to access the
      taskstats structure.
      
      The message seen on systems without the alignment fixes looks like: kernel
      unaligned access to 0xe000023879dca9bc, ip=0xa000000100133d10
      
      The addresses may vary but resolve to locations inside __delayacct_add_tsk.
      
      iotop makes what I'd call unreasonable assumptions about the contents of a
      netlink genetlink packet containing generic attributes.  They're typed and
      have headers that specify value lengths, so the client can (should)
      identify and skip the ones the client doesn't understand.
      
      The kernel, as of version 2.6.36, presented a packet like so:
      +--------------------------------+
      | genlmsghdr - 4 bytes           |
      +--------------------------------+
      | NLA header - 4 bytes           | /* Aggregate header */
      +-+------------------------------+
      | | NLA header - 4 bytes         | /* PID header */
      | +------------------------------+
      | | pid/tgid   - 4 bytes         |
      | +------------------------------+
      | | NLA header - 4 bytes         | /* stats header */
      | + -----------------------------+ <- oops. aligned on 4 byte boundary
      | | struct taskstats - 328 bytes |
      +-+------------------------------+
      
      The iotop code expects that the kernel will behave as it did then,
      assuming that the packet format is set in stone.  The format is set in
      stone, but the packet offsets are not.  There's nothing in the packet
      format that guarantees that the packet will be sent in exactly the same
      way.  The attribute contents are set (or versioned) and the aggregate
      contents are set but they can be anywhere in the packet.
      
      The issue here isn't that an unaligned structure gets passed to userspace,
      it's that the NLA infrastructure has something of a weakness: The 4 byte
      attribute header may force the payload to be unaligned.  The taskstats
      structure is created at an unaligned location and then 64-bit values are
      operated on inside the kernel, so the unaligned access warnings gets
      spewed everywhere.
      
      It's possible to use the unaligned access API to operate on the structure
      in the kernel but it seems like a wasted effort to work around userspace
      code that isn't following the packet format.  Any new additions would also
      need the be worked around.  It's a maintenance nightmare.
      
      The conclusion of the earlier discussion seemed to be "ok fine, if we have
      to break it, don't break it on arches that don't have the problem." Dave
      pointed out that the unaligned access problem doesn't only exist on ia64,
      but also on other 64-bit arches that don't have efficient unaligned access
      and it should be fixed there as well.  The committed version of the patch
      and this addition keep with the conclusion of that discussion not to break
      it unnecessarily, which the pid padding and the packet padding fixes did
      do.  x86_64 and powerpc don't suffer this problem so they shouldn't suffer
      the solution.  Other 64-bit architectures do and will, though.
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reported-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Dan Carpenter <error27@gmail.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Florian Mickler <florian@mickler.org>
      Cc: Guillaume Chazarain <guichaz@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9ab020cf
  23. 23 Dec, 2010 1 commit
    • Jeff Mahoney's avatar
      taskstats: pad taskstats netlink response for aligment issues on ia64 · 4be2c95d
      Jeff Mahoney authored
      The taskstats structure is internally aligned on 8 byte boundaries but the
      layout of the aggregrate reply, with two NLA headers and the pid (each 4
      bytes), actually force the entire structure to be unaligned.  This causes
      the kernel to issue unaligned access warnings on some architectures like
      ia64.  Unfortunately, some software out there doesn't properly unroll the
      NLA packet and assumes that the start of the taskstats structure will
      always be 20 bytes from the start of the netlink payload.  Aligning the
      start of the taskstats structure breaks this software, which we don't
      want.  So, for now the alignment only happens on architectures that
      require it and those users will have to update to fixed versions of those
      packages.  Space is reserved in the packet only when needed.  This ifdef
      should be removed in several years e.g.  2012 once we can be confident
      that fixed versions are installed on most systems.  We add the padding
      before the aggregate since the aggregate is already a defined type.
      
      Commit 85893120 ("delayacct: align to 8 byte boundary on 64-bit systems")
      previously addressed the alignment issues by padding out the pid field.
      This was supposed to be a compatible change but the circumstances
      described above mean that it wasn't.  This patch backs out that change,
      since it was a hack, and introduces a new NULL attribute type to provide
      the padding.  Padding the response with 4 bytes avoids allocating an
      aligned taskstats structure and copying it back.  Since the structure
      weighs in at 328 bytes, it's too big to do it on the stack.
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reported-by: default avatarBrian Rogers <brian@xyzw.org>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Guillaume Chazarain <guichaz@gmail.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4be2c95d
  24. 17 Dec, 2010 1 commit
  25. 28 Oct, 2010 3 commits
  26. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  27. 18 Feb, 2010 1 commit
  28. 12 Jul, 2009 1 commit
    • Johannes Berg's avatar
      genetlink: make netns aware · 134e6375
      Johannes Berg authored
      This makes generic netlink network namespace aware. No
      generic netlink families except for the controller family
      are made namespace aware, they need to be checked one by
      one and then set the family->netnsok member to true.
      
      A new function genlmsg_multicast_netns() is introduced to
      allow sending a multicast message in a given namespace,
      for example when it applies to an object that lives in
      that namespace, a new function genlmsg_multicast_allns()
      to send a message to all network namespaces (for objects
      that do not have an associated netns).
      
      The function genlmsg_multicast() is changed to multicast
      the message in just init_net, which is currently correct
      for all generic netlink families since they only work in
      init_net right now. Some will later want to work in all
      net namespaces because they do not care about the netns
      at all -- those will have to be converted to use one of
      the new functions genlmsg_multicast_allns() or
      genlmsg_multicast_netns() whenever they are made netns
      aware in some way.
      
      After this patch families can easily decide whether or
      not they should be available in all net namespaces. Many
      genl families us it for objects not related to networking
      and should therefore be available in all namespaces, but
      that will have to be done on a per family basis.
      
      Note that this doesn't touch on the checkpoint/restart
      problem where network namespaces could be used, genl
      families and multicast groups are numbered globally and
      I see no easy way of changing that, especially since it
      must be possible to multicast to all network namespaces
      for those families that do not care about netns.
      Signed-off-by: Johannes Berg's avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      134e6375
  29. 31 Dec, 2008 1 commit
  30. 13 Dec, 2008 1 commit
    • Rusty Russell's avatar
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and... · 29c0177e
      Rusty Russell authored
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.
      
      Impact: change calling convention of existing cpumask APIs
      
      Most cpumask functions started with cpus_: these have been replaced by
      cpumask_ ones which take struct cpumask pointers as expected.
      
      These four functions don't have good replacement names; fortunately
      they're rarely used, so we just change them over.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: paulus@samba.org
      Cc: mingo@redhat.com
      Cc: tony.luck@intel.com
      Cc: ralf@linux-mips.org
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: cl@linux-foundation.org
      Cc: srostedt@redhat.com
      29c0177e
  31. 25 Jul, 2008 1 commit
  32. 23 May, 2008 1 commit
  33. 30 Apr, 2008 1 commit
    • Pavel Emelyanov's avatar
      Use find_task_by_vpid in taskstats · cb41d6d0
      Pavel Emelyanov authored
      The pid to lookup a task by is passed inside taskstats code via genetlink
      message.
      
      Since netlink packets are now processed in the context of the sending task,
      this is correct to lookup the task with find_task_by_vpid() here.
      
      Besides, I fix the call to fill_pid() from taskstats_exit(), since the
      tsk->pid is not required in fill_pid() in this case, and the pid field on
      task_struct is going to be deprecated as well.
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Cc: Jonathan Lim <jlim@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb41d6d0