1. 09 Mar, 2018 2 commits
  2. 04 Dec, 2017 1 commit
  3. 16 Nov, 2017 1 commit
  4. 08 Nov, 2017 1 commit
  5. 11 Apr, 2017 1 commit
    • NeilBrown's avatar
      sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags() · 717a94b5
      NeilBrown authored
      It is not safe for one thread to modify the ->flags
      of another thread as there is no locking that can protect
      the update.
      
      So tsk_restore_flags(), which takes a task pointer and modifies
      the flags, is an invitation to do the wrong thing.
      
      All current users pass "current" as the task, so no developers have
      accepted that invitation.  It would be best to ensure it remains
      that way.
      
      So rename tsk_restore_flags() to current_restore_flags() and don't
      pass in a task_struct pointer.  Always operate on current->flags.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      717a94b5
  6. 21 Oct, 2016 1 commit
  7. 10 Oct, 2016 1 commit
    • Emese Revfy's avatar
      latent_entropy: Mark functions with __latent_entropy · 0766f788
      Emese Revfy authored
      The __latent_entropy gcc attribute can be used only on functions and
      variables.  If it is on a function then the plugin will instrument it for
      gathering control-flow entropy. If the attribute is on a variable then
      the plugin will initialize it with random contents.  The variable must
      be an integer, an integer array type or a structure with integer fields.
      
      These specific functions have been selected because they are init
      functions (to help gather boot-time entropy), are called at unpredictable
      times, or they have variable loops, each of which provide some level of
      latent entropy.
      Signed-off-by: default avatarEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message]
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      0766f788
  8. 30 Sep, 2016 1 commit
    • Eric Dumazet's avatar
      softirq: Let ksoftirqd do its job · 4cd13c21
      Eric Dumazet authored
      A while back, Paolo and Hannes sent an RFC patch adding threaded-able
      napi poll loop support : (https://patchwork.ozlabs.org/patch/620657/)
      
      The problem seems to be that softirqs are very aggressive and are often
      handled by the current process, even if we are under stress and that
      ksoftirqd was scheduled, so that innocent threads would have more chance
      to make progress.
      
      This patch makes sure that if ksoftirq is running, we let it
      perform the softirq work.
      
      Jonathan Corbet summarized the issue in https://lwn.net/Articles/687617/
      
      Tested:
      
       - NIC receiving traffic handled by CPU 0
       - UDP receiver running on CPU 0, using a single UDP socket.
       - Incoming flood of UDP packets targeting the UDP socket.
      
      Before the patch, the UDP receiver could almost never get CPU cycles and
      could only receive ~2,000 packets per second.
      
      After the patch, CPU cycles are split 50/50 between user application and
      ksoftirqd/0, and we can effectively read ~900,000 packets per second,
      a huge improvement in DOS situation. (Note that more packets are now
      dropped by the NIC itself, since the BH handlers get less CPU cycles to
      drain RX ring buffer)
      
      Since the load runs in well identified threads context, an admin can
      more easily tune process scheduling parameters if needed.
      Reported-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reported-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Hannes Frederic Sowa <hannes@redhat.com>
      Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1472665349.14381.356.camel@edumazet-glaptop3.roam.corp.google.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4cd13c21
  9. 06 Sep, 2016 1 commit
  10. 25 Mar, 2016 1 commit
  11. 29 Feb, 2016 1 commit
  12. 14 Jan, 2015 3 commits
    • Paul E. McKenney's avatar
      ksoftirqd: Use new cond_resched_rcu_qs() function · 60479676
      Paul E. McKenney authored
      Simplify run_ksoftirqd() by using the new cond_resched_rcu_qs() function
      that conditionally reschedules, but unconditionally supplies an RCU
      quiescent state.  This commit is separate from the previous commit by
      Calvin Owens because Calvin's approach can be backported, while this
      commit cannot be.  The reason that this commit cannot be backported is
      that cond_resched_rcu_qs() does not always provide the needed quiescent
      state in earlier kernels.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      60479676
    • Calvin Owens's avatar
      ksoftirqd: Enable IRQs and call cond_resched() before poking RCU · 28423ad2
      Calvin Owens authored
      While debugging an issue with excessive softirq usage, I encountered the
      following note in commit 3e339b5d ("softirq: Use hotplug thread
      infrastructure"):
      
          [ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]
      
      ...but despite this note, the patch still calls RCU with IRQs disabled.
      
      This seemingly innocuous change caused a significant regression in softirq
      CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
      introducing 0.01% packet loss, the softirq usage would jump to around 25%,
      spiking as high as 50%. Before the change, the usage would never exceed 5%.
      
      Moving the call to rcu_note_context_switch() after the cond_sched() call,
      as it was originally before the hotplug patch, completely eliminated this
      problem.
      Signed-off-by: default avatarCalvin Owens <calvinowens@fb.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      28423ad2
    • Heiko Carstens's avatar
      softirq/preempt: Add missing current->preempt_disable_ip update · 0f1ba9a2
      Heiko Carstens authored
      While debugging some "sleeping function called from invalid context" bug I
      realized that the debugging message "Preemption disabled at:" pointed to
      an incorrect function.
      
      In particular if the last function/action that disabled preemption was
      spin_lock_bh() then current->preempt_disable_ip won't be updated.
      
      The reason for this is that __local_bh_disable_ip() will increase
      preempt_count manually instead of calling preempt_count_add(), which
      would handle the update correctly.
      
      It look like the manual handling was done to work around some lockdep issue.
      
      So add the missing update of current->preempt_disable_ip to
      __local_bh_disable_ip() as well.
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20150107090441.GC4365@osirisSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0f1ba9a2
  13. 04 Nov, 2014 1 commit
  14. 07 Sep, 2014 1 commit
  15. 26 Aug, 2014 1 commit
  16. 05 May, 2014 1 commit
  17. 29 Apr, 2014 1 commit
  18. 28 Apr, 2014 1 commit
    • Thomas Gleixner's avatar
      genirq: x86: Ensure that dynamic irq allocation does not conflict · 62a08ae2
      Thomas Gleixner authored
      On x86 the allocation of irq descriptors may allocate interrupts which
      are in the range of the GSI interrupts. That's wrong as those
      interrupts are hardwired and we don't have the irq domain translation
      like PPC. So one of these interrupts can be hooked up later to one of
      the devices which are hard wired to it and the io_apic init code for
      that particular interrupt line happily reuses that descriptor with a
      completely different configuration so hell breaks lose.
      
      Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs,
      except for a few usage sites which have not yet blown up in our face
      for whatever reason. But for drivers which need an irq range, like the
      GPIO drivers, we have no limit in place and we don't want to expose
      such a detail to a driver.
      
      To cure this introduce a function which an architecture can implement
      to impose a lower bound on the dynamic interrupt allocations.
      
      Implement it for x86 and set the lower bound to nr_gsi_irqs, which is
      the end of the hardwired interrupt space, so all dynamic allocations
      happen above.
      
      That not only allows the GPIO driver to work sanely, it also protects
      the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and
      htirq code. They need to be cleaned up as well, but that's a separate
      issue.
      Reported-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Krogerus Heikki <heikki.krogerus@intel.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      62a08ae2
  19. 19 Mar, 2014 1 commit
  20. 28 Jan, 2014 3 commits
  21. 15 Jan, 2014 1 commit
    • Frederic Weisbecker's avatar
      tick: Rename tick_check_idle() to tick_irq_enter() · 5acac1be
      Frederic Weisbecker authored
      This makes the code more symetric against the existing tick functions
      called on irq exit: tick_irq_exit() and tick_nohz_irq_exit().
      
      These function are also symetric as they mirror each other's action:
      we start to account idle time on irq exit and we stop this accounting
      on irq entry. Also the tick is stopped on irq exit and timekeeping
      catches up with the tickless time elapsed until we reach irq entry.
      
      This rename was suggested by Peter Zijlstra a long while ago but it
      got forgotten in the mass.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Link: http://lkml.kernel.org/r/1387320692-28460-2-git-send-email-fweisbec@gmail.comSigned-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      5acac1be
  22. 13 Jan, 2014 2 commits
    • Peter Zijlstra's avatar
      sched/preempt, locking: Rework local_bh_{dis,en}able() · 0bd3a173
      Peter Zijlstra authored
      Currently local_bh_disable() is out-of-line for no apparent reason.
      So inline it to save a few cycles on call/return nonsense, the
      function body is a single add on x86 (a few loads and store extra on
      load/store archs).
      
      Also expose two new local_bh functions:
      
        __local_bh_{dis,en}able_ip(unsigned long ip, unsigned int cnt);
      
      Which implement the actual local_bh_{dis,en}able() behaviour.
      
      The next patch uses the exposed @Cnt argument to optimize bh lock
      functions.
      
      With build fixes from Jacob Pan.
      
      Cc: rjw@rjwysocki.net
      Cc: rui.zhang@intel.com
      Cc: jacob.jun.pan@linux.intel.com
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: hpa@zytor.com
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: lenb@kernel.org
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0bd3a173
    • Peter Zijlstra's avatar
      locking: Optimize lock_bh functions · 9ea4c380
      Peter Zijlstra authored
      Currently all _bh_ lock functions do two preempt_count operations:
      
        local_bh_disable();
        preempt_disable();
      
      and for the unlock:
      
        preempt_enable_no_resched();
        local_bh_enable();
      
      Since its a waste of perfectly good cycles to modify the same variable
      twice when you can do it in one go; use the new
      __local_bh_{dis,en}able_ip() functions that allow us to provide a
      preempt_count value to add/sub.
      
      So define SOFTIRQ_LOCK_OFFSET as the offset a _bh_ lock needs to
      add/sub to be done in one go.
      
      As a bonus it gets rid of the preempt_enable_no_resched() usage.
      
      This reduces a 1000 loops of:
      
        spin_lock_bh(&bh_lock);
        spin_unlock_bh(&bh_lock);
      
      from 53596 cycles to 51995 cycles. I didn't do enough measurements to
      say for absolute sure that the result is significant but the the few
      runs I did for each suggest it is so.
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: jacob.jun.pan@linux.intel.com
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: hpa@zytor.com
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: lenb@kernel.org
      Cc: rjw@rjwysocki.net
      Cc: rui.zhang@intel.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9ea4c380
  23. 02 Dec, 2013 1 commit
    • Frederic Weisbecker's avatar
      nohz: Convert a few places to use local per cpu accesses · e8fcaa5c
      Frederic Weisbecker authored
      A few functions use remote per CPU access APIs when they
      deal with local values.
      
      Just do the right conversion to improve performance, code
      readability and debug checks.
      
      While at it, lets extend some of these function names with *_this_cpu()
      suffix in order to display their purpose more clearly.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      e8fcaa5c
  24. 27 Nov, 2013 1 commit
  25. 19 Nov, 2013 1 commit
    • Peter Zijlstra's avatar
      lockdep: Correctly annotate hardirq context in irq_exit() · f1a83e65
      Peter Zijlstra authored
      There was a reported deadlock on -rt which lockdep didn't report.
      
      It turns out that in irq_exit() we tell lockdep that the hardirq
      context ends and then do all kinds of locking afterwards.
      
      To fix it, move trace_hardirq_exit() to the very end of irq_exit(), this
      ensures all locking in tick_irq_exit() and rcu_irq_exit() are properly
      recorded as happening from hardirq context.
      
      This however leads to the 'fun' little problem of running softirqs
      while in hardirq context. To cure this make the softirq code a little
      more complex (in the CONFIG_TRACE_IRQFLAGS case).
      
      Due to stack swizzling arch dependent trickery we cannot pass an
      argument to __do_softirq() to tell it if it was done from hardirq
      context or not; so use a side-band argument.
      
      When we do __do_softirq() from hardirq context, 'atomically' flip to
      softirq context and back, so that no locking goes without being in
      either hard- or soft-irq context.
      
      I didn't find any new problems in mainline using this patch, but it
      did show the -rt problem.
      Reported-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-dgwc5cdksbn0jk09vbmcc9sa@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f1a83e65
  26. 15 Nov, 2013 1 commit
  27. 01 Oct, 2013 6 commits
    • Frederic Weisbecker's avatar
      irq: Optimize softirq stack selection in irq exit · cc1f0274
      Frederic Weisbecker authored
      If irq_exit() is called on the arch's specified irq stack,
      it should be safe to run softirqs inline under that same
      irq stack as it is near empty by the time we call irq_exit().
      
      For example if we use the same stack for both hard and soft irqs here,
      the worst case scenario is:
      hardirq -> softirq -> hardirq. But then the softirq supersedes the
      first hardirq as the stack user since irq_exit() is called in
      a mostly empty stack. So the stack merge in this case looks acceptable.
      
      Stack overrun still have a chance to happen if hardirqs have more
      opportunities to nest, but then it's another problem to solve.
      
      So lets adapt the irq exit's softirq stack on top of a new Kconfig symbol
      that can be defined when irq_exit() runs on the irq stack. That way
      we can spare some stack switch on irq processing and all the cache
      issues that come along.
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      cc1f0274
    • Frederic Weisbecker's avatar
      irq: Justify the various softirq stack choices · 0bed698a
      Frederic Weisbecker authored
      For clarity, comment the various stack choices for softirqs
      processing, whether we execute them from ksoftirqd or
      local_irq_enable() calls.
      
      Their use on irq_exit() is already commented.
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      0bed698a
    • Frederic Weisbecker's avatar
      irq: Improve a bit softirq debugging · 5d60d3e7
      Frederic Weisbecker authored
      do_softirq() has a debug check that verifies that it is not nesting
      on softirqs processing, nor miscounting the softirq part of the preempt
      count.
      
      But making sure that softirqs processing don't nest is actually a more
      generic concern that applies to any caller of __do_softirq().
      
      Do take it one step further and generalize that debug check to
      any softirq processing.
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      5d60d3e7
    • Frederic Weisbecker's avatar
      irq: Optimize call to softirq on hardirq exit · be6e1016
      Frederic Weisbecker authored
      Before processing softirqs on hardirq exit, we already
      do the check for pending softirqs while hardirqs are
      guaranteed to be disabled.
      
      So we can take a shortcut and safely jump to the arch
      specific implementation directly.
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      be6e1016
    • Frederic Weisbecker's avatar
      irq: Consolidate do_softirq() arch overriden implementations · 7d65f4a6
      Frederic Weisbecker authored
      All arch overriden implementations of do_softirq() share the following
      common code: disable irqs (to avoid races with the pending check),
      check if there are softirqs pending, then execute __do_softirq() on
      a specific stack.
      
      Consolidate the common parts such that archs only worry about the
      stack switch.
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      7d65f4a6
    • Frederic Weisbecker's avatar
      irq: Force hardirq exit's softirq processing on its own stack · ded79754
      Frederic Weisbecker authored
      The commit facd8b80
      ("irq: Sanitize invoke_softirq") converted irq exit
      calls of do_softirq() to __do_softirq() on all architectures,
      assuming it was only used there for its irq disablement
      properties.
      
      But as a side effect, the softirqs processed in the end
      of the hardirq are always called on the inline current
      stack that is used by irq_exit() instead of the softirq
      stack provided by the archs that override do_softirq().
      
      The result is mostly safe if the architecture runs irq_exit()
      on a separate irq stack because then softirqs are processed
      on that same stack that is near empty at this stage (assuming
      hardirq aren't nesting).
      
      Otherwise irq_exit() runs in the task stack and so does the softirq
      too. The interrupted call stack can be randomly deep already and
      the softirq can dig through it even further. To add insult to the
      injury, this softirq can be interrupted by a new hardirq, maximizing
      the chances for a stack overrun as reported in powerpc for example:
      
      	do_IRQ: stack overflow: 1920
      	CPU: 0 PID: 1602 Comm: qemu-system-ppc Not tainted 3.10.4-300.1.fc19.ppc64p7 #1
      	Call Trace:
      	[c0000000050a8740] .show_stack+0x130/0x200 (unreliable)
      	[c0000000050a8810] .dump_stack+0x28/0x3c
      	[c0000000050a8880] .do_IRQ+0x2b8/0x2c0
      	[c0000000050a8930] hardware_interrupt_common+0x154/0x180
      	--- Exception: 501 at .cp_start_xmit+0x3a4/0x820 [8139cp]
      		LR = .cp_start_xmit+0x390/0x820 [8139cp]
      	[c0000000050a8d40] .dev_hard_start_xmit+0x394/0x640
      	[c0000000050a8e00] .sch_direct_xmit+0x110/0x260
      	[c0000000050a8ea0] .dev_queue_xmit+0x260/0x630
      	[c0000000050a8f40] .br_dev_queue_push_xmit+0xc4/0x130 [bridge]
      	[c0000000050a8fc0] .br_dev_xmit+0x198/0x270 [bridge]
      	[c0000000050a9070] .dev_hard_start_xmit+0x394/0x640
      	[c0000000050a9130] .dev_queue_xmit+0x428/0x630
      	[c0000000050a91d0] .ip_finish_output+0x2a4/0x550
      	[c0000000050a9290] .ip_local_out+0x50/0x70
      	[c0000000050a9310] .ip_queue_xmit+0x148/0x420
      	[c0000000050a93b0] .tcp_transmit_skb+0x4e4/0xaf0
      	[c0000000050a94a0] .__tcp_ack_snd_check+0x7c/0xf0
      	[c0000000050a9520] .tcp_rcv_established+0x1e8/0x930
      	[c0000000050a95f0] .tcp_v4_do_rcv+0x21c/0x570
      	[c0000000050a96c0] .tcp_v4_rcv+0x734/0x930
      	[c0000000050a97a0] .ip_local_deliver_finish+0x184/0x360
      	[c0000000050a9840] .ip_rcv_finish+0x148/0x400
      	[c0000000050a98d0] .__netif_receive_skb_core+0x4f8/0xb00
      	[c0000000050a99d0] .netif_receive_skb+0x44/0x110
      	[c0000000050a9a70] .br_handle_frame_finish+0x2bc/0x3f0 [bridge]
      	[c0000000050a9b20] .br_nf_pre_routing_finish+0x2ac/0x420 [bridge]
      	[c0000000050a9bd0] .br_nf_pre_routing+0x4dc/0x7d0 [bridge]
      	[c0000000050a9c70] .nf_iterate+0x114/0x130
      	[c0000000050a9d30] .nf_hook_slow+0xb4/0x1e0
      	[c0000000050a9e00] .br_handle_frame+0x290/0x330 [bridge]
      	[c0000000050a9ea0] .__netif_receive_skb_core+0x34c/0xb00
      	[c0000000050a9fa0] .netif_receive_skb+0x44/0x110
      	[c0000000050aa040] .napi_gro_receive+0xe8/0x120
      	[c0000000050aa0c0] .cp_rx_poll+0x31c/0x590 [8139cp]
      	[c0000000050aa1d0] .net_rx_action+0x1dc/0x310
      	[c0000000050aa2b0] .__do_softirq+0x158/0x330
      	[c0000000050aa3b0] .irq_exit+0xc8/0x110
      	[c0000000050aa430] .do_IRQ+0xdc/0x2c0
      	[c0000000050aa4e0] hardware_interrupt_common+0x154/0x180
      	 --- Exception: 501 at .bad_range+0x1c/0x110
      		 LR = .get_page_from_freelist+0x908/0xbb0
      	[c0000000050aa7d0] .list_del+0x18/0x50 (unreliable)
      	[c0000000050aa850] .get_page_from_freelist+0x908/0xbb0
      	[c0000000050aa9e0] .__alloc_pages_nodemask+0x21c/0xae0
      	[c0000000050aaba0] .alloc_pages_vma+0xd0/0x210
      	[c0000000050aac60] .handle_pte_fault+0x814/0xb70
      	[c0000000050aad50] .__get_user_pages+0x1a4/0x640
      	[c0000000050aae60] .get_user_pages_fast+0xec/0x160
      	[c0000000050aaf10] .__gfn_to_pfn_memslot+0x3b0/0x430 [kvm]
      	[c0000000050aafd0] .kvmppc_gfn_to_pfn+0x64/0x130 [kvm]
      	[c0000000050ab070] .kvmppc_mmu_map_page+0x94/0x530 [kvm]
      	[c0000000050ab190] .kvmppc_handle_pagefault+0x174/0x610 [kvm]
      	[c0000000050ab270] .kvmppc_handle_exit_pr+0x464/0x9b0 [kvm]
      	[c0000000050ab320]  kvm_start_lightweight+0x1ec/0x1fc [kvm]
      	[c0000000050ab4f0] .kvmppc_vcpu_run_pr+0x168/0x3b0 [kvm]
      	[c0000000050ab9c0] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
      	[c0000000050aba50] .kvm_arch_vcpu_ioctl_run+0x5c/0x1a0 [kvm]
      	[c0000000050abae0] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
      	[c0000000050abc90] .do_vfs_ioctl+0x4ec/0x7c0
      	[c0000000050abd80] .SyS_ioctl+0xd4/0xf0
      	[c0000000050abe30] syscall_exit+0x0/0x98
      
      Since this is a regression, this patch proposes a minimalistic
      and low-risk solution by blindly forcing the hardirq exit processing of
      softirqs on the softirq stack. This way we should reduce significantly
      the opportunities for task stack overflow dug by softirqs.
      
      Longer term solutions may involve extending the hardirq stack coverage to
      irq_exit(), etc...
      Reported-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: #3.9.. <stable@vger.kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Mackerras <paulus@au1.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      ded79754
  28. 25 Sep, 2013 2 commits