1. 11 Feb, 2007 4 commits
  2. 06 Jan, 2007 1 commit
  3. 22 Dec, 2006 2 commits
  4. 13 Dec, 2006 4 commits
    • Eric Dumazet's avatar
      [PATCH] SLAB: use a multiply instead of a divide in obj_to_index() · 6a2d7a95
      Eric Dumazet authored
      When some objects are allocated by one CPU but freed by another CPU we can
      consume lot of cycles doing divides in obj_to_index().
      (Typical load on a dual processor machine where network interrupts are
      handled by one particular CPU (allocating skbufs), and the other CPU is
      running the application (consuming and freeing skbufs))
      Here on one production server (dual-core AMD Opteron 285), I noticed this
      divide took 1.20 % of CPU_CLK_UNHALTED events in kernel.  But Opteron are
      quite modern cpus and the divide is much more expensive on oldest
      architectures :
      On a 200 MHz sparcv9 machine, the division takes 64 cycles instead of 1
      cycle for a multiply.
      Doing some math, we can use a reciprocal multiplication instead of a divide.
      If we want to compute V = (A / B)  (A and B being u32 quantities)
      we can instead use :
      V = ((u64)A * RECIPROCAL(B)) >> 32 ;
      where RECIPROCAL(B) is precalculated to ((1LL << 32) + (B - 1)) / B
      Note :
      I wrote pure C code for clarity. gcc output for i386 is not optimal but
      acceptable :
      mull   0x14(%ebx)
      mov    %edx,%eax // part of the >> 32
      xor     %edx,%edx // useless
      mov    %eax,(%esp) // could be avoided
      mov    %edx,0x4(%esp) // useless
      mov    (%esp),%ebx
      [[email protected]: small cleanups]
      Signed-off-by: default avatarEric Dumazet <[email protected]>
      Cc: Christoph Lameter <[email protected]>
      Cc: David Miller <[email protected]>
      Signed-off-by: default avatarAndrew Morton <[email protected]>
      Signed-off-by: default avatarLinus Torvalds <[email protected]>
    • Paul Jackson's avatar
      [PATCH] cpuset: rework cpuset_zone_allowed api · 02a0e53d
      Paul Jackson authored
      Elaborate the API for calling cpuset_zone_allowed(), so that users have to
      explicitly choose between the two variants:
      Until now, whether or not you got the hardwall flavor depended solely on
      whether or not you or'd in the __GFP_HARDWALL gfp flag to the gfp_mask
      If you didn't specify __GFP_HARDWALL, you implicitly got the softwall
      Unfortunately, this meant that users would end up with the softwall version
      without thinking about it.  Since only the softwall version might sleep,
      this led to bugs with possible sleeping in interrupt context on more than
      one occassion.
      The hardwall version requires that the current tasks mems_allowed allows
      the node of the specified zone (or that you're in interrupt or that
      __GFP_THISNODE is set or that you're on a one cpuset system.)
      The softwall version, depending on the gfp_mask, might allow a node if it
      was allowed in the nearest enclusing cpuset marked mem_exclusive (which
      requires taking the cpuset lock 'callback_mutex' to evaluate.)
      This patch removes the cpuset_zone_allowed() call, and forces the caller to
      explicitly choose between the hardwall and the softwall case.
      If the caller wants the gfp_mask to determine this choice, they should (1)
      be sure they can sleep or that __GFP_HARDWALL is set, and (2) invoke the
      cpuset_zone_allowed_softwall() routine.
      This adds another 100 or 200 bytes to the kernel text space, due to the few
      lines of nearly duplicate code at the top of both cpuset_zone_allowed_*
      routines.  It should save a few instructions executed for the calls that
      turned into calls of cpuset_zone_allowed_hardwall, thanks to not having to
      set (before the call) then check (within the call) the __GFP_HARDWALL flag.
      For the most critical call, from get_page_from_freelist(), the same
      instructions are executed as before -- the old cpuset_zone_allowed()
      routine it used to call is the same code as the
      cpuset_zone_allowed_softwall() routine that it calls now.
      Not a perfect win, but seems worth it, to reduce this chance of hitting a
      sleeping with irq off complaint again.
      Signed-off-by: default avatarPaul Jackson <[email protected]>
      Signed-off-by: default avatarAndrew Morton <[email protected]>
      Signed-off-by: default avatarLinus Torvalds <[email protected]>
    • Christoph Lameter's avatar
      [PATCH] More slab.h cleanups · 55935a34
      Christoph Lameter authored
      More cleanups for slab.h
      1. Remove tabs from weird locations as suggested by Pekka
      2. Drop the check for NUMA and SLAB_DEBUG from the fallback section
         as suggested by Pekka.
      3. Uses static inline for the fallback defs as also suggested by Pekka.
      4. Make kmem_ptr_valid take a const * argument.
      5. Separate the NUMA fallback definitions from the kmalloc_track fallback
      Signed-off-by: default avatarChristoph Lameter <[email protected]>
      Cc: Pekka Enberg <[email protected]>
      Signed-off-by: default avatarAndrew Morton <[email protected]>
      Signed-off-by: default avatarLinus Torvalds <[email protected]>
    • Christoph Lameter's avatar
      [PATCH] slab: fix sleeping in atomic bug · dd47ea75
      Christoph Lameter authored
      Fallback_alloc() does not do the check for GFP_WAIT as done in
      cache_grow().  Thus interrupts are disabled when we call kmem_getpages()
      which results in the failure.
      Duplicate the handling of GFP_WAIT in cache_grow().
      Signed-off-by: default avatarChristoph Lameter <[email protected]>
      Cc: Jay Cliburn <[email protected]>
      Signed-off-by: default avatarAndrew Morton <[email protected]>
      Signed-off-by: default avatarLinus Torvalds <[email protected]>
  5. 10 Dec, 2006 1 commit
  6. 08 Dec, 2006 3 commits
  7. 07 Dec, 2006 12 commits
  8. 22 Nov, 2006 2 commits
    • David Howells's avatar
      WorkStruct: Pass the work_struct pointer instead of context data · 65f27f38
      David Howells authored
      Pass the work_struct pointer to the work function rather than context data.
      The work function can use container_of() to work out the data.
      For the cases where the container of the work_struct may go away the moment the
      pending bit is cleared, it is made possible to defer the release of the
      structure by deferring the clearing of the pending bit.
      To make this work, an extra flag is introduced into the management side of the
      work_struct.  This governs auto-release of the structure upon execution.
      Ordinarily, the work queue executor would release the work_struct for further
      scheduling or deallocation by clearing the pending bit prior to jumping to the
      work function.  This means that, unless the driver makes some guarantee itself
      that the work_struct won't go away, the work function may not access anything
      else in the work_struct or its container lest they be deallocated..  This is a
      problem if the auxiliary data is taken away (as done by the last patch).
      However, if the pending bit is *not* cleared before jumping to the work
      function, then the work function *may* access the work_struct and its container
      with no problems.  But then the work function must itself release the
      work_struct by calling work_release().
      In most cases, automatic release is fine, so this is the default.  Special
      initiators exist for the non-auto-release case (ending in _NAR).
      Signed-Off-By: default avatarDavid Howells <[email protected]>
    • David Howells's avatar
      WorkStruct: Separate delayable and non-delayable events. · 52bad64d
      David Howells authored
      Separate delayable work items from non-delayable work items be splitting them
      into a separate structure (delayed_work), which incorporates a work_struct and
      the timer_list removed from work_struct.
      The work_struct struct is huge, and this limits it's usefulness.  On a 64-bit
      architecture it's nearly 100 bytes in size.  This reduces that by half for the
      non-delayable type of event.
      Signed-Off-By: default avatarDavid Howells <[email protected]>
  9. 03 Nov, 2006 1 commit
  10. 21 Oct, 2006 1 commit
  11. 07 Oct, 2006 1 commit
  12. 06 Oct, 2006 1 commit
  13. 04 Oct, 2006 2 commits
  14. 29 Sep, 2006 1 commit
  15. 27 Sep, 2006 3 commits
    • Christoph Lameter's avatar
      [PATCH] GFP_THISNODE for the slab allocator · 765c4507
      Christoph Lameter authored
      This patch insures that the slab node lists in the NUMA case only contain
      slabs that belong to that specific node.  All slab allocations use
      GFP_THISNODE when calling into the page allocator.  If an allocation fails
      then we fall back in the slab allocator according to the zonelists appropriate
      for a certain context.
      This allows a replication of the behavior of alloc_pages and alloc_pages node
      in the slab layer.
      Currently allocations requested from the page allocator may be redirected via
      cpusets to other nodes.  This results in remote pages on nodelists and that in
      turn results in interrupt latency issues during cache draining.  Plus the slab
      is handing out memory as local when it is really remote.
      Fallback for slab memory allocations will occur within the slab allocator and
      not in the page allocator.  This is necessary in order to be able to use the
      existing pools of objects on the nodes that we fall back to before adding more
      pages to a slab.
      The fallback function insures that the nodes we fall back to obey cpuset
      restrictions of the current context.  We do not allocate objects from outside
      of the current cpuset context like before.
      Note that the implementation of locality constraints within the slab allocator
      requires importing logic from the page allocator.  This is a mischmash that is
      not that great.  Other allocators (uncached allocator, vmalloc, huge pages)
      face similar problems and have similar minimal reimplementations of the basic
      fallback logic of the page allocator.  There is another way of implementing a
      slab by avoiding per node lists (see modular slab) but this wont work within
      the existing slab.
      - Use NUMA_BUILD to avoid #ifdef CONFIG_NUMA
      - Exploit GFP_THISNODE being 0 in the NON_NUMA case to avoid another
      [[email protected]: build fix]
      Signed-off-by: default avatarChristoph Lameter <[email protected]>
      Signed-off-by: default avatarAndrew Morton <[email protected]>
      Signed-off-by: default avatarLinus Torvalds <[email protected]>
    • Christoph Lameter's avatar
      [PATCH] slab: fix kmalloc_node applying memory policies if nodeid == numa_node_id() · de3083ec
      Christoph Lameter authored
      kmalloc_node() falls back to ___cache_alloc() under certain conditions and
      at that point memory policies may be applied redirecting the allocation
      away from the current node.  Therefore kmalloc_node(...,numa_node_id()) or
      kmalloc_node(...,-1) may not return memory from the local node.
      Fix this by doing the policy check in __cache_alloc() instead of
      This version here is a cleanup of Kiran's patch.
      - Tested on ia64.
      - Extra material removed.
      - Consolidate the exit path if alternate_node_alloc() returned an object.
      [[email protected]: warning fix]
      Signed-off-by: default avatarAlok N Kataria <[email protected]>
      Signed-off-by: default avatarRavikiran Thirumalai <[email protected]>
      Signed-off-by: default avatarShai Fultheim <[email protected]>
      Signed-off-by: default avatarChristoph Lameter <[email protected]>
      Signed-off-by: default avatarAndrew Morton <[email protected]>
      Signed-off-by: default avatarLinus Torvalds <[email protected]>
    • Alexey Dobriyan's avatar
      [PATCH] Make kmem_cache_destroy() return void · 133d205a
      Alexey Dobriyan authored
      un-, de-, -free, -destroy, -exit, etc functions should in general return
      void.  Also,
      There is very little, say, filesystem driver code can do upon failed
      kmem_cache_destroy().  If it will be decided to BUG in this case, BUG
      should be put in generic code, instead.
      Signed-off-by: default avatarAlexey Dobriyan <[email protected]>
      Signed-off-by: default avatarAndrew Morton <[email protected]>
      Signed-off-by: default avatarLinus Torvalds <[email protected]>
  16. 26 Sep, 2006 1 commit