1. 02 Apr, 2018 1 commit
  2. 02 Nov, 2017 1 commit
    • Greg Kroah-Hartman's avatar
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman authored
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      How this work was done:
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
      All documentation files were explicitly excluded.
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
         For non */uapi/* files that summary was:
         SPDX license identifier                            # files
         GPL-2.0                                              11139
         and resulted in the first patch in this series.
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
         SPDX license identifier                            # files
         GPL-2.0 WITH Linux-syscall-note                        930
         and resulted in the second patch in this series.
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
         SPDX license identifier                            # files
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
         and that resulted in the third patch in this series.
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: default avatarKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: default avatarPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  3. 01 Aug, 2017 1 commit
  4. 25 Dec, 2016 1 commit
    • Thomas Gleixner's avatar
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner authored
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
  5. 03 Mar, 2014 1 commit
  6. 13 Oct, 2012 1 commit
  7. 07 Oct, 2009 1 commit
  8. 24 Sep, 2009 1 commit
  9. 30 Apr, 2009 1 commit
    • Darren Hart's avatar
      futex: remove FUTEX_REQUEUE_PI (non CMP) · ba9c22f2
      Darren Hart authored
      The new requeue PI futex op codes were modeled after the existing
      FUTEX_REQUEUE and FUTEX_CMP_REQUEUE calls.  I was unaware at the time
      that FUTEX_REQUEUE was only around for compatibility reasons and
      shouldn't be used in new code.  Ulrich Drepper elaborates on this in his
      Futexes are Tricky paper: http://people.redhat.com/drepper/futex.pdf.
      The deprecated call doesn't catch changes to the futex corresponding to
      the destination futex which can lead to deadlock.
      Therefor, I feel it best to remove FUTEX_REQUEUE_PI and leave only
      FUTEX_CMP_REQUEUE_PI as there are not yet any existing users of the API.
      This patch does change the OP code value of FUTEX_CMP_REQUEUE_PI to 12
      from 13.  Since my test case is the only known user of this API, I felt
      this was the right thing to do, rather than leave a hole in the
      I chose to continue using the _CMP_ modifier in the OP code to make it
      explicit to the user that the test is being done.
      Builds, boots, and ran several hundred iterations requeue_pi.c.
      Signed-off-by: default avatarDarren Hart <dvhltc@us.ibm.com>
      LKML-Reference: <49ED580E.1050502@us.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
  10. 06 Apr, 2009 1 commit
    • Darren Hart's avatar
      futex: add requeue_pi functionality · 52400ba9
      Darren Hart authored
      PI Futexes and their underlying rt_mutex cannot be left ownerless if
      there are pending waiters as this will break the PI boosting logic, so
      the standard requeue commands aren't sufficient.  The new commands
      properly manage pi futex ownership by ensuring a futex with waiters
      has an owner at all times.  This will allow glibc to properly handle
      pi mutexes with pthread_condvars.
      The approach taken here is to create two new futex op codes:
      Tasks will use this op code to wait on a futex (such as a non-pi waitqueue)
      and wake after they have been requeued to a pi futex.  Prior to returning to
      userspace, they will acquire this pi futex (and the underlying rt_mutex).
      futex_wait_requeue_pi() is the result of a high speed collision between
      futex_wait() and futex_lock_pi() (with the first part of futex_lock_pi() being
      done by futex_proxy_trylock_atomic() on behalf of the top_waiter).
      This call must be used to wake tasks waiting with FUTEX_WAIT_REQUEUE_PI,
      regardless of how many tasks the caller intends to wake or requeue.
      pthread_cond_broadcast() should call this with nr_wake=1 and
      nr_requeue=INT_MAX.  pthread_cond_signal() should call this with nr_wake=1 and
      nr_requeue=0.  The reason being we need both callers to get the benefit of the
      futex_proxy_trylock_atomic() routine.  futex_requeue() also enqueues the
      top_waiter on the rt_mutex via rt_mutex_start_proxy_lock().
      Signed-off-by: default avatarDarren Hart <dvhltc@us.ibm.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
  11. 24 Nov, 2008 1 commit
    • Thomas Gleixner's avatar
      futex: make clock selectable for FUTEX_WAIT_BITSET · 1acdac10
      Thomas Gleixner authored
      FUTEX_WAIT_BITSET could be used instead of FUTEX_WAIT by setting the
      Add a flag to select CLOCK_REALTIME for FUTEX_WAIT_BITSET so glibc can
      replace the FUTEX_WAIT logic which needs to do gettimeofday() calls
      before and after the syscall to convert the absolute timeout to a
      relative timeout for FUTEX_WAIT.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Ulrich Drepper <drepper@redhat.com>
  12. 30 Sep, 2008 1 commit
  13. 24 Feb, 2008 1 commit
    • Thomas Gleixner's avatar
      futex: runtime enable pi and robust functionality · a0c1e907
      Thomas Gleixner authored
      Not all architectures implement futex_atomic_cmpxchg_inatomic().  The default
      implementation returns -ENOSYS, which is currently not handled inside of the
      futex guts.
      Futex PI calls and robust list exits with a held futex result in an endless
      loop in the futex code on architectures which have no support.
      Fixing up every place where futex_atomic_cmpxchg_inatomic() is called would
      add a fair amount of extra if/else constructs to the already complex code.  It
      is also not possible to disable the robust feature before user space tries to
      register robust lists.
      Compile time disabling is not a good idea either, as there are already
      architectures with runtime detection of futex_atomic_cmpxchg_inatomic support.
      Detect the functionality at runtime instead by calling
      cmpxchg_futex_value_locked() with a NULL pointer from the futex initialization
      code.  This is guaranteed to fail, but the call of
      futex_atomic_cmpxchg_inatomic() happens with pagefaults disabled.
      On architectures, which use the asm-generic implementation or have a runtime
      CPU feature detection, a -ENOSYS return value disables the PI/robust features.
      On architectures with a working implementation the call returns -EFAULT and
      the PI/robust features are enabled.
      The relevant syscalls return -ENOSYS and the robust list exit code is blocked,
      when the detection fails.
      Fixes http://lkml.org/lkml/2008/2/11/149
      Originally reported by: Lennart Buytenhek
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Lennert Buytenhek <buytenh@wantstofly.org>
      Cc: Riku Voipio <riku.voipio@movial.fi>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  14. 01 Feb, 2008 1 commit
    • Thomas Gleixner's avatar
      futex: Add bitset conditional wait/wakeup functionality · cd689985
      Thomas Gleixner authored
      To allow the implementation of optimized rw-locks in user space, glibc
      needs a possibility to select waiters for wakeup depending on a bitset
      This requires two new futex OPs: FUTEX_WAIT_BITS and FUTEX_WAKE_BITS
      These OPs are basically the same as FUTEX_WAIT and FUTEX_WAKE plus an
      additional argument - a bitset. Further the FUTEX_WAIT_BITS OP is
      expecting an absolute timeout value instead of the relative one, which
      is used for the FUTEX_WAIT OP.
      FUTEX_WAIT_BITS calls into the kernel with a bitset. The bitset is
      stored in the futex_q structure, which is used to enqueue the waiter
      into the hashed futex waitqueue.
      FUTEX_WAKE_BITS also calls into the kernel with a bitset. The wakeup
      function logically ANDs the bitset with the bitset stored in each
      waiters futex_q structure. If the result is zero (i.e. none of the set
      bits in the bitsets is matching), then the waiter is not woken up. If
      the result is not zero (i.e. one of the set bits in the bitsets is
      matching), then the waiter is woken.
      The bitset provided by the caller must be non zero. In case the
      provided bitset is zero the kernel returns EINVAL.
      Internaly the new OPs are only extensions to the existing FUTEX_WAIT
      and FUTEX_WAKE functions. The existing OPs hand a bitset with all bits
      set into the futex_wait() and futex_wake() functions.
      Signed-off-by: default avatarThomas Gleixner <tgxl@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  15. 25 Jan, 2008 1 commit
  16. 05 Nov, 2007 1 commit
  17. 18 Jun, 2007 1 commit
    • Thomas Gleixner's avatar
      Revert "futex_requeue_pi optimization" · bd197234
      Thomas Gleixner authored
      This reverts commit d0aa7a70.
      It not only introduced user space visible changes to the futex syscall,
      it is also non-functional and there is no way to fix it proper before
      the 2.6.22 release.
      The breakage report ( http://lkml.org/lkml/2007/5/12/17 ) went
      unanswered, and unfortunately it turned out that the concept is not
      feasible at all.  It violates the rtmutex semantics badly by introducing
      a virtual owner, which hacks around the coupling of the user-space
      pi_futex and the kernel internal rt_mutex representation.
      At the moment the only safe option is to remove it fully as it contains
      user-space visible changes to broken kernel code, which we do not want
      to expose in the 2.6.22 release.
      The patch reverts the original patch mostly 1:1, but contains a couple
      of trivial manual cleanups which were necessary due to patches, which
      touched the same area of code later.
      Verified against the glibc tests and my own PI futex tests.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Acked-by: default avatarUlrich Drepper <drepper@redhat.com>
      Cc: Pierre Peiffer <pierre.peiffer@bull.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  18. 09 May, 2007 3 commits
    • Eric Dumazet's avatar
      FUTEX: new PRIVATE futexes · 34f01cc1
      Eric Dumazet authored
        Analysis of current linux futex code :
      A central hash table futex_queues[] holds all contexts (futex_q) of waiting
      Each futex_wait()/futex_wait() has to obtain a spinlock on a hash slot to
      perform lookups or insert/deletion of a futex_q.
      When a futex_wait() is done, calling thread has to :
      1) - Obtain a read lock on mmap_sem to be able to validate the user pointer
           (calling find_vma()). This validation tells us if the futex uses
           an inode based store (mapped file), or mm based store (anonymous mem)
      2) - compute a hash key
      3) - Atomic increment of reference counter on an inode or a mm_struct
      4) - lock part of futex_queues[] hash table
      5) - perform the test on value of futex.
      	(rollback is value != expected_value, returns EWOULDBLOCK)
      	(various loops if test triggers mm faults)
      6) queue the context into hash table, release the lock got in 4)
      7) - release the read_lock on mmap_sem
      8) Eventually unqueue the context (but rarely, as this part  may be done
         by the futex_wake())
      Futexes were designed to improve scalability but current implementation has
      various problems :
      - Central hashtable :
        This means scalability problems if many processes/threads want to use
        futexes at the same time.
        This means NUMA unbalance because this hashtable is located on one node.
      - Using mmap_sem on every futex() syscall :
        Even if mmap_sem is a rw_semaphore, up_read()/down_read() are doing atomic
        ops on mmap_sem, dirtying cache line :
          - lot of cache line ping pongs on SMP configurations.
        mmap_sem is also extensively used by mm code (page faults, mmap()/munmap())
        Highly threaded processes might suffer from mmap_sem contention.
        mmap_sem is also used by oprofile code. Enabling oprofile hurts threaded
        programs because of contention on the mmap_sem cache line.
      - Using an atomic_inc()/atomic_dec() on inode ref counter or mm ref counter:
        It's also a cache line ping pong on SMP. It also increases mmap_sem hold time
        because of cache misses.
      Most of these scalability problems come from the fact that futexes are in
      one global namespace.  As we use a central hash table, we must make sure
      they are all using the same reference (given by the mm subsystem).  We
      chose to force all futexes be 'shared'.  This has a cost.
      But fact is POSIX defined PRIVATE and SHARED, allowing clear separation,
      and optimal performance if carefuly implemented.  Time has come for linux
      to have better threading performance.
      The goal is to permit new futex commands to avoid :
       - Taking the mmap_sem semaphore, conflicting with other subsystems.
       - Modifying a ref_count on mm or an inode, still conflicting with mm or fs.
      This is possible because, for one process using PTHREAD_PROCESS_PRIVATE
      futexes, we only need to distinguish futexes by their virtual address, no
      matter the underlying mm storage is.
      If glibc wants to exploit this new infrastructure, it should use new
      _PRIVATE futex subcommands for PTHREAD_PROCESS_PRIVATE futexes.  And be
      prepared to fallback on old subcommands for old kernels.  Using one global
      variable with the FUTEX_PRIVATE_FLAG or 0 value should be OK.
      PTHREAD_PROCESS_SHARED futexes should still use the old subcommands.
      Compatibility with old applications is preserved, they still hit the
      scalability problems, but new applications can fly :)
      Note : the same SHARED futex (mapped on a file) can be used by old binaries
      *and* new binaries, because both binaries will use the old subcommands.
      Note : Vast majority of futexes should be using PROCESS_PRIVATE semantic,
      as this is the default semantic. Almost all applications should benefit
      of this changes (new kernel and updated libc)
      Some bench results on a Pentium M 1.6 GHz (SMP kernel on a UP machine)
      /* calling futex_wait(addr, value) with value != *addr */
      433 cycles per futex(FUTEX_WAIT) call (mixing 2 futexes)
      424 cycles per futex(FUTEX_WAIT) call (using one futex)
      334 cycles per futex(FUTEX_WAIT_PRIVATE) call (mixing 2 futexes)
      334 cycles per futex(FUTEX_WAIT_PRIVATE) call (using one futex)
      For reference :
      187 cycles per getppid() call
      188 cycles per umask() call
      181 cycles per ni_syscall() call
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Pierre Peiffer <pierre.peiffer@bull.net>
      Cc: "Ulrich Drepper" <drepper@gmail.com>
      Cc: "Nick Piggin" <nickpiggin@yahoo.com.au>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Pierre Peiffer's avatar
      futex_requeue_pi optimization · d0aa7a70
      Pierre Peiffer authored
      This patch provides the futex_requeue_pi functionality, which allows some
      threads waiting on a normal futex to be requeued on the wait-queue of a
      This provides an optimization, already used for (normal) futexes, to be used
      with the PI-futexes.
      This optimization is currently used by the glibc in pthread_broadcast, when
      using "normal" mutexes.  With futex_requeue_pi, it can be used with
      PRIO_INHERIT mutexes too.
      Signed-off-by: default avatarPierre Peiffer <pierre.peiffer@bull.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Pierre Peiffer's avatar
      Make futex_wait() use an hrtimer for timeout · c19384b5
      Pierre Peiffer authored
      This patch modifies futex_wait() to use an hrtimer + schedule() in place of
      schedule_timeout() is tick based, therefore the timeout granularity is the
      tick (1 ms, 4 ms or 10 ms depending on HZ).  By using a high resolution timer
      for timeout wakeup, we can attain a much finer timeout granularity (in the
      microsecond range).  This parallels what is already done for futex_lock_pi().
      The timeout passed to the syscall is no longer converted to jiffies and is
      therefore passed to do_futex() and futex_wait() as an absolute ktime_t
      therefore keeping nanosecond resolution.
      Also this removes the need to pass the nanoseconds timeout part to
      futex_lock_pi() in val2.
      In futex_wait(), if there is no timeout then a regular schedule() is
      performed.  Otherwise, an hrtimer is fired before schedule() is called.
      [akpm@linux-foundation.org: fix `make headers_check']
      Signed-off-by: default avatarSebastien Dugue <sebastien.dugue@bull.net>
      Signed-off-by: default avatarPierre Peiffer <pierre.peiffer@bull.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  19. 08 May, 2007 1 commit
  20. 10 Dec, 2006 1 commit
  21. 29 Jul, 2006 1 commit
  22. 28 Jun, 2006 2 commits
    • Ingo Molnar's avatar
      [PATCH] pi-futex: futex_lock_pi/futex_unlock_pi support · c87e2837
      Ingo Molnar authored
      This adds the actual pi-futex implementation, based on rt-mutexes.
      [dino@in.ibm.com: fix an oops-causing race]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarDinakar Guniguntala <dino@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Ingo Molnar's avatar
      [PATCH] pi-futex: futex code cleanups · e2970f2f
      Ingo Molnar authored
      We are pleased to announce "lightweight userspace priority inheritance" (PI)
      support for futexes.  The following patchset and glibc patch implements it,
      ontop of the robust-futexes patchset which is included in 2.6.16-mm1.
      We are calling it lightweight for 3 reasons:
       - in the user-space fastpath a PI-enabled futex involves no kernel work
         (or any other PI complexity) at all.  No registration, no extra kernel
         calls - just pure fast atomic ops in userspace.
       - in the slowpath (in the lock-contention case), the system call and
         scheduling pattern is in fact better than that of normal futexes, due to
         the 'integrated' nature of FUTEX_LOCK_PI.  [more about that further down]
       - the in-kernel PI implementation is streamlined around the mutex
         abstraction, with strict rules that keep the implementation relatively
         simple: only a single owner may own a lock (i.e.  no read-write lock
         support), only the owner may unlock a lock, no recursive locking, etc.
        Priority Inheritance - why, oh why???
      Many of you heard the horror stories about the evil PI code circling Linux for
      years, which makes no real sense at all and is only used by buggy applications
      and which has horrible overhead.  Some of you have dreaded this very moment,
      when someone actually submits working PI code ;-)
      So why would we like to see PI support for futexes?
      We'd like to see it done purely for technological reasons.  We dont think it's
      a buggy concept, we think it's useful functionality to offer to applications,
      which functionality cannot be achieved in other ways.  We also think it's the
      right thing to do, and we think we've got the right arguments and the right
      numbers to prove that.  We also believe that we can address all the
      counter-arguments as well.  For these reasons (and the reasons outlined below)
      we are submitting this patch-set for upstream kernel inclusion.
      What are the benefits of PI?
        The short reply:
      User-space PI helps achieving/improving determinism for user-space
      applications.  In the best-case, it can help achieve determinism and
      well-bound latencies.  Even in the worst-case, PI will improve the statistical
      distribution of locking related application delays.
        The longer reply:
      Firstly, sharing locks between multiple tasks is a common programming
      technique that often cannot be replaced with lockless algorithms.  As we can
      see it in the kernel [which is a quite complex program in itself], lockless
      structures are rather the exception than the norm - the current ratio of
      lockless vs.  locky code for shared data structures is somewhere between 1:10
      and 1:100.  Lockless is hard, and the complexity of lockless algorithms often
      endangers to ability to do robust reviews of said code.  I.e.  critical RT
      apps often choose lock structures to protect critical data structures, instead
      of lockless algorithms.  Furthermore, there are cases (like shared hardware,
      or other resource limits) where lockless access is mathematically impossible.
      Media players (such as Jack) are an example of reasonable application design
      with multiple tasks (with multiple priority levels) sharing short-held locks:
      for example, a highprio audio playback thread is combined with medium-prio
      construct-audio-data threads and low-prio display-colory-stuff threads.  Add
      video and decoding to the mix and we've got even more priority levels.
      So once we accept that synchronization objects (locks) are an unavoidable fact
      of life, and once we accept that multi-task userspace apps have a very fair
      expectation of being able to use locks, we've got to think about how to offer
      the option of a deterministic locking implementation to user-space.
      Most of the technical counter-arguments against doing priority inheritance
      only apply to kernel-space locks.  But user-space locks are different, there
      we cannot disable interrupts or make the task non-preemptible in a critical
      section, so the 'use spinlocks' argument does not apply (user-space spinlocks
      have the same priority inversion problems as other user-space locking
      constructs).  Fact is, pretty much the only technique that currently enables
      good determinism for userspace locks (such as futex-based pthread mutexes) is
      priority inheritance:
      Currently (without PI), if a high-prio and a low-prio task shares a lock [this
      is a quite common scenario for most non-trivial RT applications], even if all
      critical sections are coded carefully to be deterministic (i.e.  all critical
      sections are short in duration and only execute a limited number of
      instructions), the kernel cannot guarantee any deterministic execution of the
      high-prio task: any medium-priority task could preempt the low-prio task while
      it holds the shared lock and executes the critical section, and could delay it
      As mentioned before, the userspace fastpath of PI-enabled pthread mutexes
      involves no kernel work at all - they behave quite similarly to normal
      futex-based locks: a 0 value means unlocked, and a value==TID means locked.
      (This is the same method as used by list-based robust futexes.) Userspace uses
      atomic ops to lock/unlock these mutexes without entering the kernel.
      To handle the slowpath, we have added two new futex ops:
      If the lock-acquire fastpath fails, [i.e.  an atomic transition from 0 to TID
      fails], then FUTEX_LOCK_PI is called.  The kernel does all the remaining work:
      if there is no futex-queue attached to the futex address yet then the code
      looks up the task that owns the futex [it has put its own TID into the futex
      value], and attaches a 'PI state' structure to the futex-queue.  The pi_state
      includes an rt-mutex, which is a PI-aware, kernel-based synchronization
      object.  The 'other' task is made the owner of the rt-mutex, and the
      FUTEX_WAITERS bit is atomically set in the futex value.  Then this task tries
      to lock the rt-mutex, on which it blocks.  Once it returns, it has the mutex
      acquired, and it sets the futex value to its own TID and returns.  Userspace
      has no other work to perform - it now owns the lock, and futex value contains
      If the unlock side fastpath succeeds, [i.e.  userspace manages to do a TID ->
      0 atomic transition of the futex value], then no kernel work is triggered.
      If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), then
      FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the behalf of
      userspace - and it also unlocks the attached pi_state->rt_mutex and thus wakes
      up any potential waiters.
      Note that under this approach, contrary to other PI-futex approaches, there is
      no prior 'registration' of a PI-futex.  [which is not quite possible anyway,
      due to existing ABI properties of pthread mutexes.]
      Also, under this scheme, 'robustness' and 'PI' are two orthogonal properties
      of futexes, and all four combinations are possible: futex, robust-futex,
      PI-futex, robust+PI-futex.
        glibc support:
      Ulrich Drepper and Jakub Jelinek have written glibc support for PI-futexes
      (and robust futexes), enabling robust and PI (PTHREAD_PRIO_INHERIT) POSIX
      mutexes.  (PTHREAD_PRIO_PROTECT support will be added later on too, no
      additional kernel changes are needed for that).  [NOTE: The glibc patch is
      obviously inofficial and unsupported without matching upstream kernel
      the patch-queue and the glibc patch can also be downloaded from:
      Many thanks go to the people who helped us create this kernel feature: Steven
      Rostedt, Esben Nielsen, Benedikt Spranger, Daniel Walker, John Cooper, Arjan
      van de Ven, Oleg Nesterov and others.  Credits for related prior projects goes
      to Dirk Grambow, Inaky Perez-Gonzalez, Bill Huey and many others.
      Clean up the futex code, before adding more features to it:
       - use u32 as the futex field type - that's the ABI
       - use __user and pointers to u32 instead of unsigned long
       - code style / comment style cleanups
       - rename hash-bucket name from 'bh' to 'hb'.
      I checked the pre and post futex.o object files to make sure this
      patch has no code effects.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  23. 27 Mar, 2006 3 commits
  24. 07 Sep, 2005 1 commit
    • Jakub Jelinek's avatar
      [PATCH] FUTEX_WAKE_OP: pthread_cond_signal() speedup · 4732efbe
      Jakub Jelinek authored
      ATM pthread_cond_signal is unnecessarily slow, because it wakes one waiter
      (which at least on UP usually means an immediate context switch to one of
      the waiter threads).  This waiter wakes up and after a few instructions it
      attempts to acquire the cv internal lock, but that lock is still held by
      the thread calling pthread_cond_signal.  So it goes to sleep and eventually
      the signalling thread is scheduled in, unlocks the internal lock and wakes
      the waiter again.
      Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal
      to avoid this performance issue, but it was removed when locks were
      redesigned to the 3 state scheme (unlocked, locked uncontended, locked
      Following scenario shows why simply using FUTEX_REQUEUE in
      pthread_cond_signal together with using lll_mutex_unlock_force in place of
      lll_mutex_unlock is not enough and probably why it has been disabled at
      that time:
      The number is value in cv->__data.__lock.
              thr1            thr2            thr3
      0       pthread_cond_wait
      1       lll_mutex_lock (cv->__data.__lock)
      0       lll_mutex_unlock (cv->__data.__lock)
      0       lll_futex_wait (&cv->__data.__futex, futexval)
      0                       pthread_cond_signal
      1                       lll_mutex_lock (cv->__data.__lock)
      1                                       pthread_cond_signal
      2                                       lll_mutex_lock (cv->__data.__lock)
      2                                         lll_futex_wait (&cv->__data.__lock, 2)
      2                       lll_futex_requeue (&cv->__data.__futex, 0, 1, &cv->__data.__lock)
                                # FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE
      2                       lll_mutex_unlock_force (cv->__data.__lock)
      0                         cv->__data.__lock = 0
      0                         lll_futex_wake (&cv->__data.__lock, 1)
      1       lll_mutex_lock (cv->__data.__lock)
      0       lll_mutex_unlock (cv->__data.__lock)
                # Here, lll_mutex_unlock doesn't know there are threads waiting
                # on the internal cv's lock
      Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal,
      but it will cost us not one, but 2 extra syscalls and, what's worse, one of
      these extra syscalls will be done for every single waiting loop in
      We would need to use lll_mutex_unlock_force in pthread_cond_signal after
      requeue and lll_mutex_cond_lock in pthread_cond_*wait after lll_futex_wait.
      Another alternative is to do the unlocking pthread_cond_signal needs to do
      (the lock can't be unlocked before lll_futex_wake, as that is racy) in the
      I have implemented both variants, futex-requeue-glibc.patch is the first
      one and futex-wake_op{,-glibc}.patch is the unlocking inside of the kernel.
       The kernel interface allows userland to specify how exactly an unlocking
      operation should look like (some atomic arithmetic operation with optional
      constant argument and comparison of the previous futex value with another
      It has been implemented just for ppc*, x86_64 and i?86, for other
      architectures I'm including just a stub header which can be used as a
      starting point by maintainers to write support for their arches and ATM
      will just return -ENOSYS for FUTEX_WAKE_OP.  The requeue patch has been
      (lightly) tested just on x86_64, the wake_op patch on ppc64 kernel running
      32-bit and 64-bit NPTL and x86_64 kernel running 32-bit and 64-bit NPTL.
      With the following benchmark on UP x86-64 I get:
      for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so --library-path .:$i /tmp/bench; \
      for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2>&1; done; done
      time elf/ld.so --library-path .:nptl-orig /tmp/bench
      real 0m0.655s user 0m0.253s sys 0m0.403s
      real 0m0.657s user 0m0.269s sys 0m0.388s
      time elf/ld.so --library-path .:nptl-requeue /tmp/bench
      real 0m0.496s user 0m0.225s sys 0m0.271s
      real 0m0.531s user 0m0.242s sys 0m0.288s
      time elf/ld.so --library-path .:nptl-wake_op /tmp/bench
      real 0m0.380s user 0m0.176s sys 0m0.204s
      real 0m0.382s user 0m0.175s sys 0m0.207s
      The benchmark is at:
      Older futex-requeue-glibc.patch version is at:
      Older futex-wake_op-glibc.patch version is at:
      Will post a new version (just x86-64 fixes so that the patch
      applies against pthread_cond_signal.S) to libc-hacker ml soon.
      Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded
      testcase that will not test the atomicity of the operation, but at least
      check if the threads that should have been woken up are woken up and
      whether the arithmetic operation in the kernel gave the expected results.
      Acked-by: default avatarIngo Molnar <mingo@redhat.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Jamie Lokier <jamie@shareable.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarYoichi Yuasa <yuasa@hh.iij4u.or.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  25. 16 Apr, 2005 1 commit
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      Let it rip!