1. 30 Jul, 2009 1 commit
    • Dave Hansen's avatar
      lib: flexible array implementation · 534acc05
      Dave Hansen authored
      Once a structure goes over PAGE_SIZE*2, we see occasional allocation
      failures.  Some people have chosen to switch over to things like vmalloc()
      that will let them keep array-like access to such a large structures.
      But, vmalloc() has plenty of downsides.
      Here's an alternative.  I think it's what Andrew was suggesting here:
      I call it a flexible array.  It does all of its work in PAGE_SIZE bits, so
      never does an order>0 allocation.  The base level has
      PAGE_SIZE-2*sizeof(int) bytes of storage for pointers to the second level.
       So, with a 32-bit arch, you get about 4MB (4183112 bytes) of total
      storage when the objects pack nicely into a page.  It is half that on
      64-bit because the pointers are twice the size.  There's a table detailing
      this in the code.
      There are kerneldocs for the functions, but here's an
      flex_array_alloc() - dynamically allocate a base structure
      flex_array_free() - free the array and all of the
      		    second-level pages
      flex_array_free_parts() - free the second-level pages, but
      			  not the base (for static bases)
      flex_array_put() - copy into the array at the given index
      flex_array_get() - copy out of the array at the given index
      flex_array_prealloc() - preallocate the second-level pages
      			between the given indexes to
      			guarantee no allocs will occur at
      			put() time.
      We could also potentially just pass the "element_size" into each of the
      API functions instead of storing it internally.  That would get us one
      more base pointer on 32-bit.
      I've been testing this by running it in userspace.  The header and patch
      that I've been using are here, as well as the little script I'm using to
      generate the size table which goes in the kerneldocs.
      [[email protected]: coding-style fixes]
      Signed-off-by: default avatarDave Hansen <[email protected]>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <[email protected]>
      Signed-off-by: default avatarAndrew Morton <[email protected]>
      Signed-off-by: default avatarLinus Torvalds <[email protected]>
  2. 18 Jun, 2009 1 commit
  3. 15 Jun, 2009 1 commit
    • Paul Mackerras's avatar
      lib: Provide generic atomic64_t implementation · 09d4e0ed
      Paul Mackerras authored
      Many processor architectures have no 64-bit atomic instructions, but
      we need atomic64_t in order to support the perf_counter subsystem.
      This adds an implementation of 64-bit atomic operations using hashed
      spinlocks to provide atomicity.  For each atomic operation, the address
      of the atomic64_t variable is hashed to an index into an array of 16
      spinlocks.  That spinlock is taken (with interrupts disabled) around the
      operation, which can then be coded non-atomically within the lock.
      On UP, all the spinlock manipulation goes away and we simply disable
      interrupts around each operation.  In fact gcc eliminates the whole
      atomic64_lock variable as well.
      Signed-off-by: default avatarPaul Mackerras <[email protected]>
      Signed-off-by: default avatarBenjamin Herrenschmidt <[email protected]>
  4. 11 Jun, 2009 2 commits
  5. 24 Apr, 2009 1 commit
  6. 24 Mar, 2009 1 commit
    • Jason Baron's avatar
      dynamic debug: combine dprintk and dynamic printk · e9d376f0
      Jason Baron authored
      This patch combines Greg Bank's dprintk() work with the existing dynamic
      printk patchset, we are now calling it 'dynamic debug'.
      The new feature of this patchset is a richer /debugfs control file interface,
      (an example output from my system is at the bottom), which allows fined grained
      control over the the debug output. The output can be controlled by function,
      file, module, format string, and line number.
      for example, enabled all debug messages in module 'nf_conntrack':
      echo -n 'module nf_conntrack +p' > /mnt/debugfs/dynamic_debug/control
      to disable them:
      echo -n 'module nf_conntrack -p' > /mnt/debugfs/dynamic_debug/control
      A further explanation can be found in the documentation patch.
      Signed-off-by: default avatarGreg Banks <[email protected]>
      Signed-off-by: default avatarJason Baron <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
  7. 05 Mar, 2009 1 commit
  8. 04 Mar, 2009 1 commit
  9. 16 Jan, 2009 1 commit
    • Peter Zijlstra's avatar
      sched: make plist a library facility · ceacc2c1
      Peter Zijlstra authored
      Ingo Molnar wrote:
      > here's a new build failure with tip/sched/rt:
      >   LD      .tmp_vmlinux1
      > kernel/built-in.o: In function `set_curr_task_rt':
      > sched.c:(.text+0x3675): undefined reference to `plist_del'
      > kernel/built-in.o: In function `pick_next_task_rt':
      > sched.c:(.text+0x37ce): undefined reference to `plist_del'
      > kernel/built-in.o: In function `enqueue_pushable_task':
      > sched.c:(.text+0x381c): undefined reference to `plist_del'
      Eliminate the plist library kconfig and make it available
      Signed-off-by: default avatarPeter Zijlstra <[email protected]>
      Signed-off-by: default avatarIngo Molnar <[email protected]>
  10. 08 Jan, 2009 1 commit
  11. 05 Jan, 2009 1 commit
  12. 04 Jan, 2009 1 commit
    • Alain Knaff's avatar
      bzip2/lzma: config and initramfs support for bzip2/lzma decompression · 30d65dbf
      Alain Knaff authored
      Impact: New code for initramfs decompression, new features
      This is the second part of the bzip2/lzma patch
      The bzip patch is based on an idea by Christian Ludwig, includes support for
      compressing the kernel with bzip2 or lzma rather than gzip. Both
      compressors give smaller sizes than gzip.  Lzma's decompresses faster
      than bzip2.
      It also supports ramdisks and initramfs' compressed using these two
      The functionality has been successfully used for a couple of years by
      the udpcast project
      This version applies to "tip" kernel 2.6.28
      This part contains:
      - support for new compressions (bzip2 and lzma) in initramfs and
      old-style ramdisk
      - config dialog for kernel compression (but new kernel compressions
      not yet supported)
      Signed-off-by: default avatarAlain Knaff <[email protected]>
      Signed-off-by: default avatarH. Peter Anvin <[email protected]>
  13. 31 Dec, 2008 1 commit
  14. 13 Nov, 2008 1 commit
    • David Howells's avatar
      CRED: Inaugurate COW credentials · d84f4f99
      David Howells authored
      Inaugurate copy-on-write credentials management.  This uses RCU to manage the
      credentials pointer in the task_struct with respect to accesses by other tasks.
      A process may only modify its own credentials, and so does not need locking to
      access or modify its own credentials.
      A mutex (cred_replace_mutex) is added to the task_struct to control the effect
      of PTRACE_ATTACHED on credential calculations, particularly with respect to
      With this patch, the contents of an active credentials struct may not be
      changed directly; rather a new set of credentials must be prepared, modified
      and committed using something like the following sequence of events:
      	struct cred *new = prepare_creds();
      	int ret = blah(new);
      	if (ret < 0) {
      		return ret;
      	return commit_creds(new);
      There are some exceptions to this rule: the keyrings pointed to by the active
      credentials may be instantiated - keyrings violate the COW rule as managing
      COW keyrings is tricky, given that it is possible for a task to directly alter
      the keys in a keyring in use by another task.
      To help enforce this, various pointers to sets of credentials, such as those in
      the task_struct, are declared const.  The purpose of this is compile-time
      discouragement of altering credentials through those pointers.  Once a set of
      credentials has been made public through one of these pointers, it may not be
      modified, except under special circumstances:
        (1) Its reference count may incremented and decremented.
        (2) The keyrings to which it points may be modified, but not replaced.
      The only safe way to modify anything else is to create a replacement and commit
      using the functions described in Documentation/credentials.txt (which will be
      added by a later patch).
      This patch and the preceding patches have been tested with the LTP SELinux
      This patch makes several logical sets of alteration:
       (1) execve().
           This now prepares and commits credentials in various places in the
           security code rather than altering the current creds directly.
       (2) Temporary credential overrides.
           do_coredump() and sys_faccessat() now prepare their own credentials and
           temporarily override the ones currently on the acting thread, whilst
           preventing interference from other threads by holding cred_replace_mutex
           on the thread being dumped.
           This will be replaced in a future patch by something that hands down the
           credentials directly to the functions being called, rather than altering
           the task's objective credentials.
       (3) LSM interface.
           A number of functions have been changed, added or removed:
           (*) security_capset_check(), ->capset_check()
           (*) security_capset_set(), ->capset_set()
           	 Removed in favour of security_capset().
           (*) security_capset(), ->capset()
           	 New.  This is passed a pointer to the new creds, a pointer to the old
           	 creds and the proposed capability sets.  It should fill in the new
           	 creds or return an error.  All pointers, barring the pointer to the
           	 new creds, are now const.
           (*) security_bprm_apply_creds(), ->bprm_apply_creds()
           	 Changed; now returns a value, which will cause the process to be
           	 killed if it's an error.
           (*) security_task_alloc(), ->task_alloc_security()
           	 Removed in favour of security_prepare_creds().
           (*) security_cred_free(), ->cred_free()
           	 New.  Free security data attached to cred->security.
           (*) security_prepare_creds(), ->cred_prepare()
           	 New. Duplicate any security data attached to cred->security.
           (*) security_commit_creds(), ->cred_commit()
           	 New. Apply any security effects for the upcoming installation of new
           	 security by commit_creds().
           (*) security_task_post_setuid(), ->task_post_setuid()
           	 Removed in favour of security_task_fix_setuid().
           (*) security_task_fix_setuid(), ->task_fix_setuid()
           	 Fix up the proposed new credentials for setuid().  This is used by
           	 cap_set_fix_setuid() to implicitly adjust capabilities in line with
           	 setuid() changes.  Changes are made to the new credentials, rather
           	 than the task itself as in security_task_post_setuid().
           (*) security_task_reparent_to_init(), ->task_reparent_to_init()
           	 Removed.  Instead the task being reparented to init is referred
           	 directly to init's credentials.
      	 NOTE!  This results in the loss of some state: SELinux's osid no
      	 longer records the sid of the thread that forked it.
           (*) security_key_alloc(), ->key_alloc()
           (*) security_key_permission(), ->key_permission()
           	 Changed.  These now take cred pointers rather than task pointers to
           	 refer to the security context.
       (4) sys_capset().
           This has been simplified and uses less locking.  The LSM functions it
           calls have been merged.
       (5) reparent_to_kthreadd().
           This gives the current thread the same credentials as init by simply using
           commit_thread() to point that way.
       (6) __sigqueue_alloc() and switch_uid()
           __sigqueue_alloc() can't stop the target task from changing its creds
           beneath it, so this function gets a reference to the currently applicable
           user_struct which it then passes into the sigqueue struct it returns if
           switch_uid() is now called from commit_creds(), and possibly should be
           folded into that.  commit_creds() should take care of protecting
       (7) [sg]et[ug]id() and co and [sg]et_current_groups.
           The set functions now all use prepare_creds(), commit_creds() and
           abort_creds() to build and check a new set of credentials before applying
           security_task_set[ug]id() is called inside the prepared section.  This
           guarantees that nothing else will affect the creds until we've finished.
           The calling of set_dumpable() has been moved into commit_creds().
           Much of the functionality of set_user() has been moved into
           The get functions all simply access the data directly.
       (8) security_task_prctl() and cap_task_prctl().
           security_task_prctl() has been modified to return -ENOSYS if it doesn't
           want to handle a function, or otherwise return the return value directly
           rather than through an argument.
           Additionally, cap_task_prctl() now prepares a new set of credentials, even
           if it doesn't end up using it.
       (9) Keyrings.
           A number of changes have been made to the keyrings code:
           (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
           	 all been dropped and built in to the credentials functions directly.
           	 They may want separating out again later.
           (b) key_alloc() and search_process_keyrings() now take a cred pointer
           	 rather than a task pointer to specify the security context.
           (c) copy_creds() gives a new thread within the same thread group a new
           	 thread keyring if its parent had one, otherwise it discards the thread
           (d) The authorisation key now points directly to the credentials to extend
           	 the search into rather pointing to the task that carries them.
           (e) Installing thread, process or session keyrings causes a new set of
           	 credentials to be created, even though it's not strictly necessary for
           	 process or session keyrings (they're shared).
      (10) Usermode helper.
           The usermode helper code now carries a cred struct pointer in its
           subprocess_info struct instead of a new session keyring pointer.  This set
           of credentials is derived from init_cred and installed on the new process
           after it has been cloned.
           call_usermodehelper_setup() allocates the new credentials and
           call_usermodehelper_freeinfo() discards them if they haven't been used.  A
           special cred function (prepare_usermodeinfo_creds()) is provided
           specifically for call_usermodehelper_setup() to call.
           call_usermodehelper_setkeys() adjusts the credentials to sport the
           supplied keyring as the new session keyring.
      (11) SELinux.
           SELinux has a number of changes, in addition to those to support the LSM
           interface changes mentioned above:
           (a) selinux_setprocattr() no longer does its check for whether the
           	 current ptracer can access processes with the new SID inside the lock
           	 that covers getting the ptracer's SID.  Whilst this lock ensures that
           	 the check is done with the ptracer pinned, the result is only valid
           	 until the lock is released, so there's no point doing it inside the
      (12) is_single_threaded().
           This function has been extracted from selinux_setprocattr() and put into
           a file of its own in the lib/ directory as join_session_keyring() now
           wants to use it too.
           The code in SELinux just checked to see whether a task shared mm_structs
           with other tasks (CLONE_VM), but that isn't good enough.  We really want
           to know if they're part of the same thread group (CLONE_THREAD).
      (13) nfsd.
           The NFS server daemon now has to use the COW credentials to set the
           credentials it is going to use.  It really needs to pass the credentials
           down to the functions it calls, but it can't do that until other patches
           in this series have been applied.
      Signed-off-by: default avatarDavid Howells <[email protected]>
      Acked-by: default avatarJames Morris <[email protected]>
      Signed-off-by: default avatarJames Morris <[email protected]>
  15. 20 Oct, 2008 1 commit
  16. 16 Oct, 2008 1 commit
    • Jason Baron's avatar
      driver core: basic infrastructure for per-module dynamic debug messages · 346e15be
      Jason Baron authored
      Base infrastructure to enable per-module debug messages.
      I've introduced CONFIG_DYNAMIC_PRINTK_DEBUG, which when enabled centralizes
      control of debugging statements on a per-module basis in one /proc file,
      currently, <debugfs>/dynamic_printk/modules. When, CONFIG_DYNAMIC_PRINTK_DEBUG,
      is not set, debugging statements can still be enabled as before, often by
      defining 'DEBUG' for the proper compilation unit. Thus, this patch set has no
      affect when CONFIG_DYNAMIC_PRINTK_DEBUG is not set.
      The infrastructure currently ties into all pr_debug() and dev_dbg() calls. That
      is, if CONFIG_DYNAMIC_PRINTK_DEBUG is set, all pr_debug() and dev_dbg() calls
      can be dynamically enabled/disabled on a per-module basis.
      Future plans include extending this functionality to subsystems, that define 
      their own debug levels and flags.
      Dynamic debugging is controlled by the debugfs file, 
      <debugfs>/dynamic_printk/modules. This file contains a list of the modules that
      can be enabled. The format of the file is as follows:
      	<module_name> <enabled=0/1>
      	<module_name> : Name of the module in which the debug call resides
      	<enabled=0/1> : whether the messages are enabled or not
      For example:
      	snd_hda_intel enabled=0
      	fixup enabled=1
      	driver enabled=0
      Enable a module:
      	$echo "set enabled=1 <module_name>" > dynamic_printk/modules
      Disable a module:
      	$echo "set enabled=0 <module_name>" > dynamic_printk/modules
      Enable all modules:
      	$echo "set enabled=1 all" > dynamic_printk/modules
      Disable all modules:
      	$echo "set enabled=0 all" > dynamic_printk/modules
      Finally, passing "dynamic_printk" at the command line enables
      debugging for all modules. This mode can be turned off via the above
      disable command.
      [gkh: minor cleanups and tweaks to make the build work quietly]
      Signed-off-by: default avatarJason Baron <[email protected]>
      Signed-off-by: default avatarGreg Kroah-Hartman <[email protected]>
  17. 03 Oct, 2008 1 commit
  18. 26 Jul, 2008 2 commits
  19. 24 Jul, 2008 1 commit
  20. 17 Jul, 2008 2 commits
    • Ingo Molnar's avatar
      ftrace: do not trace library functions · 2464a609
      Ingo Molnar authored
      make function tracing more robust: do not trace library functions.
      We've already got a sizable list of exceptions:
       ifdef CONFIG_FTRACE
       # Do not profile string.o, since it may be used in early boot or vdso
       CFLAGS_REMOVE_string.o = -pg
       # Also do not profile any debug utilities
       CFLAGS_REMOVE_spinlock_debug.o = -pg
       CFLAGS_REMOVE_list_debug.o = -pg
       CFLAGS_REMOVE_debugobjects.o = -pg
       CFLAGS_REMOVE_find_next_bit.o = -pg
       CFLAGS_REMOVE_cpumask.o = -pg
       CFLAGS_REMOVE_bitmap.o = -pg
      ... and the pattern has been that random library functionality showed
      up in ftrace's critical path (outside of its recursion check), causing
      hard to debug lockups.
      So be a bit defensive about it and exclude all lib/*.o functions by
      default. It's not that they are overly interesting for tracing purposes
      anyway. Specific ones can still be traced, in an opt-in manner.
      Signed-off-by: default avatarIngo Molnar <[email protected]>
    • Ingo Molnar's avatar
      ftrace: fix lockup with MAXSMP · 9fa11137
      Ingo Molnar authored
      MAXSMP brings in lots of use of various bitops in smp_processor_id()
      and friends - causing ftrace to lock up during bootup:
        calling  anon_inode_init+0x0/0x130
        initcall anon_inode_init+0x0/0x130 returned 0 after 0 msecs
        calling  acpi_event_init+0x0/0x57
        [ hard hang ]
      So exclude the bitops facilities from tracing.
      Signed-off-by: default avatarIngo Molnar <[email protected]>
  21. 12 Jul, 2008 1 commit
  22. 23 May, 2008 3 commits
  23. 30 Apr, 2008 1 commit
    • Thomas Gleixner's avatar
      infrastructure to debug (dynamic) objects · 3ac7fe5a
      Thomas Gleixner authored
      We can see an ever repeating problem pattern with objects of any kind in the
      1) freeing of active objects
      2) reinitialization of active objects
      Both problems can be hard to debug because the crash happens at a point where
      we have no chance to decode the root cause anymore.  One problem spot are
      kernel timers, where the detection of the problem often happens in interrupt
      context and usually causes the machine to panic.
      While working on a timer related bug report I had to hack specialized code
      into the timer subsystem to get a reasonable hint for the root cause.  This
      debug hack was fine for temporary use, but far from a mergeable solution due
      to the intrusiveness into the timer code.
      The code further lacked the ability to detect and report the root cause
      instantly and keep the system operational.
      Keeping the system operational is important to get hold of the debug
      information without special debugging aids like serial consoles and special
      knowledge of the bug reporter.
      The problems described above are not restricted to timers, but timers tend to
      expose it usually in a full system crash.  Other objects are less explosive,
      but the symptoms caused by such mistakes can be even harder to debug.
      Instead of creating specialized debugging code for the timer subsystem a
      generic infrastructure is created which allows developers to verify their code
      and provides an easy to enable debug facility for users in case of trouble.
      The debugobjects core code keeps track of operations on static and dynamic
      objects by inserting them into a hashed list and sanity checking them on
      object operations and provides additional checks whenever kernel memory is
      The tracked object operations are:
      - initializing an object
      - adding an object to a subsystem list
      - deleting an object from a subsystem list
      Each operation is sanity checked before the operation is executed and the
      subsystem specific code can provide a fixup function which allows to prevent
      the damage of the operation.  When the sanity check triggers a warning message
      and a stack trace is printed.
      The list of operations can be extended if the need arises.  For now it's
      limited to the requirements of the first user (timers).
      The core code enqueues the objects into hash buckets.  The hash index is
      generated from the address of the object to simplify the lookup for the check
      on kfree/vfree.  Each bucket has it's own spinlock to avoid contention on a
      global lock.
      The debug code can be compiled in without being active.  The runtime overhead
      is minimal and could be optimized by asm alternatives.  A kernel command line
      option enables the debugging code.
      Thanks to Ingo Molnar for review, suggestions and cleanup patches.
      Signed-off-by: default avatarThomas Gleixner <[email protected]>
      Signed-off-by: default avatarIngo Molnar <[email protected]>
      Cc: Greg KH <[email protected]>
      Cc: Randy Dunlap <[email protected]>
      Cc: Kay Sievers <[email protected]>
      Signed-off-by: default avatarAndrew Morton <[email protected]>
      Signed-off-by: default avatarLinus Torvalds <[email protected]>
  24. 29 Apr, 2008 1 commit
  25. 26 Apr, 2008 1 commit
    • Alexander van Heukelum's avatar
      x86: generic versions of find_first_(zero_)bit, convert i386 · 77b9bd9c
      Alexander van Heukelum authored
      Generic versions of __find_first_bit and __find_first_zero_bit
      are introduced as simplified versions of __find_next_bit and
      __find_next_zero_bit. Their compilation and use are guarded by
      a new config variable GENERIC_FIND_FIRST_BIT.
      The generic versions of find_first_bit and find_first_zero_bit
      are implemented in terms of the newly introduced __find_first_bit
      and __find_first_zero_bit.
      This patch does not remove the i386-specific implementation,
      but it does switch i386 to use the generic functions by setting
      GENERIC_FIND_FIRST_BIT=y for X86_32.
      Signed-off-by: default avatarAlexander van Heukelum <[email protected]>
      Signed-off-by: default avatarIngo Molnar <[email protected]>
  26. 17 Apr, 2008 1 commit
  27. 28 Mar, 2008 1 commit
  28. 14 Feb, 2008 1 commit
  29. 08 Feb, 2008 1 commit
  30. 05 Feb, 2008 1 commit
  31. 28 Jan, 2008 2 commits
  32. 15 Nov, 2007 1 commit
  33. 19 Oct, 2007 1 commit
  34. 17 Oct, 2007 1 commit
    • Peter Zijlstra's avatar
      lib: floating proportions · 145ca25e
      Peter Zijlstra authored
      Given a set of objects, floating proportions aims to efficiently give the
      proportional 'activity' of a single item as compared to the whole set. Where
      'activity' is a measure of a temporal property of the items.
      It is efficient in that it need not inspect any other items of the set
      in order to provide the answer. It is not even needed to know how many
      other items there are.
      It has one parameter, and that is the period of 'time' over which the
      'activity' is measured.
      Signed-off-by: default avatarPeter Zijlstra <[email protected]>
      Signed-off-by: default avatarAndrew Morton <[email protected]>
      Signed-off-by: default avatarLinus Torvalds <[email protected]>