1. 04 Jan, 2019 1 commit
    • Linus Torvalds's avatar
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds authored
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  2. 12 Dec, 2018 1 commit
  3. 10 Dec, 2018 1 commit
  4. 29 Nov, 2018 1 commit
  5. 31 Oct, 2018 2 commits
    • Mike Rapoport's avatar
      mm: remove include/linux/bootmem.h · 57c8a661
      Mike Rapoport authored
      Move remaining definitions and declarations from include/linux/bootmem.h
      into include/linux/memblock.h and remove the redundant header.
      
      The includes were replaced with the semantic patch below and then
      semi-automated removal of duplicated '#include <linux/memblock.h>
      
      @@
      @@
      - #include <linux/bootmem.h>
      + #include <linux/memblock.h>
      
      [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
      [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
        Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
      [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
        Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
      Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      57c8a661
    • Mike Rapoport's avatar
      memblock: remove _virt from APIs returning virtual address · eb31d559
      Mike Rapoport authored
      The conversion is done using
      
      sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
      	$(git grep -l memblock_virt_alloc)
      
      Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Serge Semin <fancer.lancer@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eb31d559
  6. 12 Oct, 2018 4 commits
    • sergey-senozhatsky's avatar
      printk: fix integer overflow in setup_log_buf() · d2130e82
      sergey-senozhatsky authored
      The way we calculate logbuf free space percentage overflows signed
      integer:
      
      	int free;
      
      	free = __LOG_BUF_LEN - log_next_idx;
      	pr_info("early log buf free: %u(%u%%)\n",
      		free, (free * 100) / __LOG_BUF_LEN);
      
      We support LOG_BUF_LEN of up to 1<<25 bytes. Since setup_log_buf() is
      called during early init, logbuf is mostly empty, so
      
      	__LOG_BUF_LEN - log_next_idx
      
      is close to 1<<25. Thus when we multiply it by 100, we overflow signed
      integer value range: 100 is 2^6 + 2^5 + 2^2.
      
      Example, booting with LOG_BUF_LEN 1<<25 and log_buf_len=2G
      boot param:
      
      [    0.075317] log_buf_len: -2147483648 bytes
      [    0.075319] early log buf free: 33549896(-28%)
      
      Make "free" unsigned integer and use appropriate printk() specifier.
      
      Link: http://lkml.kernel.org/r/20181010113308.9337-1-sergey.senozhatsky@gmail.com
      To: Steven Rostedt <rostedt@goodmis.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Signed-off-by: sergey-senozhatsky's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      d2130e82
    • sergey-senozhatsky's avatar
      printk: do not preliminary split up cont buffer · 0e96a19c
      sergey-senozhatsky authored
      We have a proper 'overflow' check which tells us that we need to
      split up existing cont buffer in separate records:
      
      	if (cont.len + len > sizeof(cont.buf))
      		cont_flush();
      
      At the same time we also have one extra flush: "if cont buffer is
      80% full then split it up" in cont_add():
      
      	if (cont.len > (sizeof(cont.buf) * 80) / 100)
      		cont_flush();
      
      This looks to be redundant, since the existing "overflow" check
      should work just fine, so remove this 80% check and wait for either
      a normal cont termination \n, for preliminary flush due to
      possible buffer overflow or for preliminary flush due to cont race.
      
      Link: http://lkml.kernel.org/r/20181002023836.4487-4-sergey.senozhatsky@gmail.com
      To: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: LKML <linux-kernel@vger.kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Signed-off-by: sergey-senozhatsky's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      0e96a19c
    • sergey-senozhatsky's avatar
      printk: lock/unlock console only for new logbuf entries · 3ac37a93
      sergey-senozhatsky authored
      Prior to commit 5c2992ee ("printk: remove console flushing special
      cases for partial buffered lines") we would do console_cont_flush()
      for each pr_cont() to print cont fragments, so console_unlock() would
      actually print data:
      
      	pr_cont();
      	 console_lock();
      	 console_unlock()
      	  console_cont_flush(); // print cont fragment
      	...
      	pr_cont();
      	 console_lock();
      	 console_unlock()
      	  console_cont_flush(); // print cont fragment
      
      We don't do console_cont_flush() anymore, so when we do pr_cont()
      console_unlock() does nothing (unless we flushed the cont buffer):
      
      	pr_cont();
      	 console_lock();
      	 console_unlock();      // noop
      	...
      	pr_cont();
      	 console_lock();
      	 console_unlock();      // noop
      	...
      	pr_cont();
      	  cont_flush();
      	    console_lock();
      	    console_unlock();   // print data
      
      We also wakeup klogd purposelessly for pr_cont() output - un-flushed
      cont buffer is not stored in log_buf; there is nothing to pull.
      
      Thus we can console_lock()/console_unlock()/wake_up_klogd() only when
      we know that we log_store()-ed a message and there is something to
      print to the consoles/syslog.
      
      Link: http://lkml.kernel.org/r/20181002023836.4487-3-sergey.senozhatsky@gmail.com
      To: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: LKML <linux-kernel@vger.kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Signed-off-by: sergey-senozhatsky's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      3ac37a93
    • sergey-senozhatsky's avatar
      printk: keep kernel cont support always enabled · 9627808d
      sergey-senozhatsky authored
      Since commit 5c2992ee ("printk: remove console flushing special
      cases for partial buffered lines") we don't print cont fragments
      to the consoles; cont lines are now proper log_buf entries and
      there is no "consecutive continuation flag" anymore: we either
      have 'c' entries that mark continuation lines without fragments;
      or '-' entries that mark normal logbuf entries. There are no '+'
      entries anymore.
      
      However, we still have a small leftover - presence of ext_console
      drivers disables kernel cont support and we flush each pr_cont()
      and store it as a separate log_buf entry. Previously, it worked
      because msg_print_ext_header() had that "an optional external merge
      of the records" functionality:
      
      	if (msg->flags & LOG_CONT)
      		cont = (prev_flags & LOG_CONT) ? '+' : 'c';
      
      We don't do this as of now, so keep kernel cont always enabled.
      
      Note from pmladek:
      
      The original purpose was to get full information including
      the metadata and dictionary via extended console drivers,
      see commit 6fe29354 ("printk: implement support
      for extended console drivers").
      
      The dictionary probably was the most important part but
      it was actually lost:
      
        static void cont_flush(void)
        {
        [...]
      	log_store(cont.facility, cont.level, cont.flags, cont.ts_nsec,
      		  NULL, 0, cont.buf, cont.len);
      
      Nobody noticed because the only dictionary user is dev_printk()
      and dev_cont() is _not_ defined.
      
      Link: http://lkml.kernel.org/r/20181002023836.4487-2-sergey.senozhatsky@gmail.com
      To: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: LKML <linux-kernel@vger.kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Signed-off-by: sergey-senozhatsky's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      [pmladek@suse.com: Updated commit message]
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      9627808d
  7. 09 Oct, 2018 1 commit
  8. 05 Oct, 2018 3 commits
  9. 02 Oct, 2018 2 commits
    • Sergey Senozhatsky's avatar
      printk: CON_PRINTBUFFER console registration is a bit racy · 884e370e
      Sergey Senozhatsky authored
      CON_PRINTBUFFER console registration requires us to do several
      preparation steps:
      - Rollback console_seq to replay logbuf messages which were already
        seen on other consoles;
      - Set exclusive_console flag so console_unlock() will ->write() logbuf
        messages only to the exclusive_console driver.
      
      The way we do it, however, is a bit racy
      
      	logbuf_lock_irqsave(flags);
      	console_seq = syslog_seq;
      	console_idx = syslog_idx;
      	logbuf_unlock_irqrestore(flags);
      						<< preemption enabled
      						<< irqs enabled
      	exclusive_console = newcon;
      	console_unlock();
      
      We rollback console_seq under logbuf_lock with IRQs disabled, but
      we set exclusive_console with local IRQs enabled and logbuf unlocked.
      If the system oops-es or panic-s before we set exclusive_console - and
      given that we have IRQs and preemption enabled there is such a
      possibility - we will re-play all logbuf messages to every registered
      console, which may be a bit annoying and time consuming.
      
      Move exclusive_console assignment to the same IRQs-disabled and
      logbuf_lock-protected section where we rollback console_seq.
      
      Link: http://lkml.kernel.org/r/20180928095304.9972-1-sergey.senozhatsky@gmail.com
      To: Steven Rostedt <rostedt@goodmis.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: sergey-senozhatsky's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      884e370e
    • Petr Mladek's avatar
      printk: Do not miss new messages when replaying the log · f92b070f
      Petr Mladek authored
      The variable "exclusive_console" is used to reply all existing messages
      on a newly registered console. It is cleared when all messages are out.
      
      The problem is that new messages might appear in the meantime. These
      are then visible only on the exclusive console.
      
      The obvious solution is to clear "exclusive_console" after we replay
      all messages that were already proceed before we started the reply.
      Reported-by: sergey-senozhatsky's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Link: http://lkml.kernel.org/r/20180913123406.14378-1-pmladek@suse.com
      To: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Acked-by: sergey-senozhatsky's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      f92b070f
  10. 11 Sep, 2018 1 commit
  11. 06 Sep, 2018 1 commit
    • Steven Rostedt (VMware)'s avatar
      printk/tracing: Do not trace printk_nmi_enter() · d1c392c9
      Steven Rostedt (VMware) authored
      I hit the following splat in my tests:
      
      ------------[ cut here ]------------
      IRQs not enabled as expected
      WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
      Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
      Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
      EIP: tick_nohz_idle_enter+0x44/0x8c
      Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
      75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0
      e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
      EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
      ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
      DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
      CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
      Call Trace:
       do_idle+0x33/0x202
       cpu_startup_entry+0x61/0x63
       start_secondary+0x18e/0x1ed
       startup_32_smp+0x164/0x168
      irq event stamp: 18773830
      hardirqs last  enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10
      hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10
      softirqs last  enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf
      softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b
      ---[ end trace b7c64aa79e17954a ]---
      
      After a bit of debugging, I found what was happening. This would trigger
      when performing "perf" with a high NMI interrupt rate, while enabling and
      disabling function tracer. Ftrace uses breakpoints to convert the nops at
      the start of functions to calls to the function trampolines. The breakpoint
      traps disable interrupts and this makes calls into lockdep via the
      trace_hardirqs_off_thunk in the entry.S code. What happens is the following:
      
        do_idle {
      
          [interrupts enabled]
      
          <interrupt> [interrupts disabled]
      	TRACE_IRQS_OFF [lockdep says irqs off]
      	[...]
      	TRACE_IRQS_IRET
      	    test if pt_regs say return to interrupts enabled [yes]
      	    TRACE_IRQS_ON [lockdep says irqs are on]
      
      	    <nmi>
      		nmi_enter() {
      		    printk_nmi_enter() [traced by ftrace]
      		    [ hit ftrace breakpoint ]
      		    <breakpoint exception>
      			TRACE_IRQS_OFF [lockdep says irqs off]
      			[...]
      			TRACE_IRQS_IRET [return from breakpoint]
      			   test if pt_regs say interrupts enabled [no]
      			   [iret back to interrupt]
      	   [iret back to code]
      
          tick_nohz_idle_enter() {
      
      	lockdep_assert_irqs_enabled() [lockdep say no!]
      
      Although interrupts are indeed enabled, lockdep thinks it is not, and since
      we now do asserts via lockdep, it gives a false warning. The issue here is
      that printk_nmi_enter() is called before lockdep_off(), which disables
      lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
      printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
      confused.
      
      Cc: stable@vger.kernel.org
      Fixes: 42a0bb3f ("printk/nmi: generic solution for safe printk in NMI")
      Acked-by: sergey-senozhatsky's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: default avatarPetr Mladek <pmladek@suse.com>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      d1c392c9
  12. 30 Aug, 2018 1 commit
  13. 22 Aug, 2018 1 commit
  14. 31 Jul, 2018 1 commit
  15. 11 Jul, 2018 1 commit
  16. 09 Jul, 2018 4 commits
  17. 28 Jun, 2018 1 commit
  18. 27 Jun, 2018 2 commits
  19. 05 Jun, 2018 1 commit
  20. 16 May, 2018 1 commit
  21. 25 Apr, 2018 1 commit
    • Sergey Senozhatsky's avatar
      printk: wake up klogd in vprintk_emit · 43a17111
      Sergey Senozhatsky authored
      We wake up klogd very late - only when current console_sem owner
      is done pushing pending kernel messages to the serial/net consoles.
      In some cases this results in lost syslog messages, because kernel
      log buffer is a circular buffer and if we don't wakeup syslog long
      enough there are chances that logbuf simply will wrap around.
      
      The patch moves the klogd wake up call to vprintk_emit(), which is
      the only legit way for a kernel message to appear in the logbuf,
      right after the attempt to handle consoles. As a result, klogd
      will get waken either after flushing the new message to consoles
      or immediately when consoles are still busy with older messages.
      
      Link: http://lkml.kernel.org/r/20180419014250.5692-1-sergey.senozhatsky@gmail.com
      To: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: sergey-senozhatsky's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      43a17111
  22. 06 Apr, 2018 1 commit
  23. 23 Mar, 2018 1 commit
  24. 15 Mar, 2018 1 commit
  25. 27 Feb, 2018 1 commit
    • Petr Mladek's avatar
      printk: Wake klogd when passing console_lock owner · c14376de
      Petr Mladek authored
      wake_klogd is a local variable in console_unlock(). The information
      is lost when the console_lock owner using the busy wait added by
      the commit dbdda842 ("printk: Add console owner and waiter
      logic to load balance console writes"). The following race is
      possible:
      
      CPU0				CPU1
      console_unlock()
      
        for (;;)
           /* calling console for last message */
      
      				printk()
      				  log_store()
      				    log_next_seq++;
      
           /* see new message */
           if (seen_seq != log_next_seq) {
      	wake_klogd = true;
      	seen_seq = log_next_seq;
           }
      
           console_lock_spinning_enable();
      
      				  if (console_trylock_spinning())
      				     /* spinning */
      
           if (console_lock_spinning_disable_and_check()) {
      	printk_safe_exit_irqrestore(flags);
      	return;
      
      				  console_unlock()
      				    if (seen_seq != log_next_seq) {
      				    /* already seen */
      				    /* nothing to do */
      
      Result: Nobody would wakeup klogd.
      
      One solution would be to make a global variable from wake_klogd.
      But then we would need to manipulate it under a lock or so.
      
      This patch wakes klogd also when console_lock is passed to the
      spinning waiter. It looks like the right way to go. Also userspace
      should have a chance to see and store any "flood" of messages.
      
      Note that the very late klogd wake up was a historic solution.
      It made sense on single CPU systems or when sys_syslog() operations
      were synchronized using the big kernel lock like in v2.1.113.
      But it is questionable these days.
      
      Fixes: dbdda842 ("printk: Add console owner and waiter logic to load balance console writes")
      Link: http://lkml.kernel.org/r/20180226155734.dzwg3aovqnwtvkoy@pathway.suse.cz
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: Tejun Heo <tj@kernel.org>
      Suggested-by: sergey-senozhatsky's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: sergey-senozhatsky's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      c14376de
  26. 11 Feb, 2018 1 commit
    • Linus Torvalds's avatar
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds authored
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  27. 08 Feb, 2018 1 commit
  28. 22 Jan, 2018 1 commit
  29. 16 Jan, 2018 1 commit
    • Sergey Senozhatsky's avatar
      printk: Never set console_may_schedule in console_trylock() · fd5f7cde
      Sergey Senozhatsky authored
      This patch, basically, reverts commit 6b97a20d ("printk:
      set may_schedule for some of console_trylock() callers").
      That commit was a mistake, it introduced a big dependency
      on the scheduler, by enabling preemption under console_sem
      in printk()->console_unlock() path, which is rather too
      critical. The patch did not significantly reduce the
      possibilities of printk() lockups, but made it possible to
      stall printk(), as has been reported by Tetsuo Handa [1].
      
      Another issues is that preemption under console_sem also
      messes up with Steven Rostedt's hand off scheme, by making
      it possible to sleep with console_sem both in console_unlock()
      and in vprintk_emit(), after acquiring the console_sem
      ownership (anywhere between printk_safe_exit_irqrestore() in
      console_trylock_spinning() and printk_safe_enter_irqsave()
      in console_unlock()). This makes hand off less likely and,
      at the same time, may result in a significant amount of
      pending logbuf messages. Preempted console_sem owner makes
      it impossible for other CPUs to emit logbuf messages, but
      does not make it impossible for other CPUs to append new
      messages to the logbuf.
      
      Reinstate the old behavior and make printk() non-preemptible.
      Should any printk() lockup reports arrive they must be handled
      in a different way.
      
      [1] http://lkml.kernel.org/r/201603022101.CAH73907.OVOOMFHFFtQJSL%20()%20I-love%20!%20SAKURA%20!%20ne%20!%20jp
      Fixes: 6b97a20d ("printk: set may_schedule for some of console_trylock() callers")
      Link: http://lkml.kernel.org/r/20180116044716.GE6607@jagdpanzerIV
      To: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: akpm@linux-foundation.org
      Cc: linux-mm@kvack.org
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: sergey-senozhatsky's avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      fd5f7cde