1. 05 Apr, 2019 1 commit
    • Valentin Schneider's avatar
      cpu/hotplug: Mute hotplug lockdep during init · 964065d2
      Valentin Schneider authored
      [ Upstream commit ce48c457 ]
      
      Since we've had:
      
        commit cb538267 ("jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations")
      
      we've been getting some lockdep warnings during init, such as on HiKey960:
      
      [    0.820495] WARNING: CPU: 4 PID: 0 at kernel/cpu.c:316 lockdep_assert_cpus_held+0x3c/0x48
      [    0.820498] Modules linked in:
      [    0.820509] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G S                4.20.0-rc5-00051-g4cae42a #34
      [    0.820511] Hardware name: HiKey960 (DT)
      [    0.820516] pstate: 600001c5 (nZCv dAIF -PAN -UAO)
      [    0.820520] pc : lockdep_assert_cpus_held+0x3c/0x48
      [    0.820523] lr : lockdep_assert_cpus_held+0x38/0x48
      [    0.820526] sp : ffff00000a9cbe50
      [    0.820528] x29: ffff00000a9cbe50 x28: 0000000000000000
      [    0.820533] x27: 00008000b69e5000 x26: ffff8000bff4cfe0
      [    0.820537] x25: ffff000008ba69e0 x24: 0000000000000001
      [    0.820541] x23: ffff000008fce000 x22: ffff000008ba70c8
      [    0.820545] x21: 0000000000000001 x20: 0000000000000003
      [    0.820548] x19: ffff00000a35d628 x18: ffffffffffffffff
      [    0.820552] x17: 0000000000000000 x16: 0000000000000000
      [    0.820556] x15: ffff00000958f848 x14: 455f3052464d4d34
      [    0.820559] x13: 00000000769dde98 x12: ffff8000bf3f65a8
      [    0.820564] x11: 0000000000000000 x10: ffff00000958f848
      [    0.820567] x9 : ffff000009592000 x8 : ffff00000958f848
      [    0.820571] x7 : ffff00000818ffa0 x6 : 0000000000000000
      [    0.820574] x5 : 0000000000000000 x4 : 0000000000000001
      [    0.820578] x3 : 0000000000000000 x2 : 0000000000000001
      [    0.820582] x1 : 00000000ffffffff x0 : 0000000000000000
      [    0.820587] Call trace:
      [    0.820591]  lockdep_assert_cpus_held+0x3c/0x48
      [    0.820598]  static_key_enable_cpuslocked+0x28/0xd0
      [    0.820606]  arch_timer_check_ool_workaround+0xe8/0x228
      [    0.820610]  arch_timer_starting_cpu+0xe4/0x2d8
      [    0.820615]  cpuhp_invoke_callback+0xe8/0xd08
      [    0.820619]  notify_cpu_starting+0x80/0xb8
      [    0.820625]  secondary_start_kernel+0x118/0x1d0
      
      We've also had a similar warning in sched_init_smp() for every
      asymmetric system that would enable the sched_asym_cpucapacity static
      key, although that was singled out in:
      
        commit 40fa3780 ("sched/core: Take the hotplug lock in sched_init_smp()")
      
      Those warnings are actually harmless, since we cannot have hotplug
      operations at the time they appear. Instead of starting to sprinkle
      useless hotplug lock operations in the init codepaths, mute the
      warnings until they start warning about real problems.
      Suggested-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: cai@gmx.us
      Cc: daniel.lezcano@linaro.org
      Cc: dietmar.eggemann@arm.com
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: longman@redhat.com
      Cc: marc.zyngier@arm.com
      Cc: mark.rutland@arm.com
      Link: https://lkml.kernel.org/r/1545243796-23224-2-git-send-email-valentin.schneider@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      964065d2
  2. 03 Apr, 2019 1 commit
    • Thomas Gleixner's avatar
      cpu/hotplug: Prevent crash when CPU bringup fails on CONFIG_HOTPLUG_CPU=n · c3bcf031
      Thomas Gleixner authored
      commit 206b9235 upstream.
      
      Tianyu reported a crash in a CPU hotplug teardown callback when booting a
      kernel which has CONFIG_HOTPLUG_CPU disabled with the 'nosmt' boot
      parameter.
      
      It turns out that the SMP=y CONFIG_HOTPLUG_CPU=n case has been broken
      forever in case that a bringup callback fails. Unfortunately this issue was
      not recognized when the CPU hotplug code was reworked, so the shortcoming
      just stayed in place.
      
      When a bringup callback fails, the CPU hotplug code rolls back the
      operation and takes the CPU offline.
      
      The 'nosmt' command line argument uses a bringup failure to abort the
      bringup of SMT sibling CPUs. This partial bringup is required due to the
      MCE misdesign on Intel CPUs.
      
      With CONFIG_HOTPLUG_CPU=y the rollback works perfectly fine, but
      CONFIG_HOTPLUG_CPU=n lacks essential mechanisms to exercise the low level
      teardown of a CPU including the synchronizations in various facilities like
      RCU, NOHZ and others.
      
      As a consequence the teardown callbacks which must be executed on the
      outgoing CPU within stop machine with interrupts disabled are executed on
      the control CPU in interrupt enabled and preemptible context causing the
      kernel to crash and burn. The pre state machine code has a different
      failure mode which is more subtle and resulting in a less obvious use after
      free crash because the control side frees resources which are still in use
      by the undead CPU.
      
      But this is not a x86 only problem. Any architecture which supports the
      SMP=y HOTPLUG_CPU=n combination suffers from the same issue. It's just less
      likely to be triggered because in 99.99999% of the cases all bringup
      callbacks succeed.
      
      The easy solution of making HOTPLUG_CPU mandatory for SMP is not working on
      all architectures as the following architectures have either no hotplug
      support at all or not all subarchitectures support it:
      
       alpha, arc, hexagon, openrisc, riscv, sparc (32bit), mips (partial).
      
      Crashing the kernel in such a situation is not an acceptable state
      either.
      
      Implement a minimal rollback variant by limiting the teardown to the point
      where all regular teardown callbacks have been invoked and leave the CPU in
      the 'dead' idle state. This has the following consequences:
      
       - the CPU is brought down to the point where the stop_machine takedown
         would happen.
      
       - the CPU stays there forever and is idle
      
       - The CPU is cleared in the CPU active mask, but not in the CPU online
         mask which is a legit state.
      
       - Interrupts are not forced away from the CPU
      
       - All facilities which only look at online mask would still see it, but
         that is the case during normal hotplug/unplug operations as well. It's
         just a (way) longer time frame.
      
      This will expose issues, which haven't been exposed before or only seldom,
      because now the normally transient state of being non active but online is
      a permanent state. In testing this exposed already an issue vs. work queues
      where the vmstat code schedules work on the almost dead CPU which ends up
      in an unbound workqueue and triggers 'preemtible context' warnings. This is
      not a problem of this change, it merily exposes an already existing issue.
      Still this is better than crashing fully without a chance to debug it.
      
      This is mainly thought as workaround for those architectures which do not
      support HOTPLUG_CPU. All others should enforce HOTPLUG_CPU for SMP.
      
      Fixes: 2e1a3483 ("cpu/hotplug: Split out the state walk into functions")
      Reported-by: default avatarTianyu Lan <Tianyu.Lan@microsoft.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarTianyu Lan <Tianyu.Lan@microsoft.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Konrad Wilk <konrad.wilk@oracle.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Mukesh Ojha <mojha@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Micheal Kelley <michael.h.kelley@microsoft.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190326163811.503390616@linutronix.deSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3bcf031
  3. 30 Jan, 2019 1 commit
    • Josh Poimboeuf's avatar
      cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM · b284909a
      Josh Poimboeuf authored
      With the following commit:
      
        73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      
      ... the hotplug code attempted to detect when SMT was disabled by BIOS,
      in which case it reported SMT as permanently disabled.  However, that
      code broke a virt hotplug scenario, where the guest is booted with only
      primary CPU threads, and a sibling is brought online later.
      
      The problem is that there doesn't seem to be a way to reliably
      distinguish between the HW "SMT disabled by BIOS" case and the virt
      "sibling not yet brought online" case.  So the above-mentioned commit
      was a bit misguided, as it permanently disabled SMT for both cases,
      preventing future virt sibling hotplugs.
      
      Going back and reviewing the original problems which were attempted to
      be solved by that commit, when SMT was disabled in BIOS:
      
        1) /sys/devices/system/cpu/smt/control showed "on" instead of
           "notsupported"; and
      
        2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.
      
      I'd propose that we instead consider #1 above to not actually be a
      problem.  Because, at least in the virt case, it's possible that SMT
      wasn't disabled by BIOS and a sibling thread could be brought online
      later.  So it makes sense to just always default the smt control to "on"
      to allow for that possibility (assuming cpuid indicates that the CPU
      supports SMT).
      
      The real problem is #2, which has a simple fix: change vmx_vm_init() to
      query the actual current SMT state -- i.e., whether any siblings are
      currently online -- instead of looking at the SMT "control" sysfs value.
      
      So fix it by:
      
        a) reverting the original "fix" and its followup fix:
      
           73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
           bc2d8d26 ("cpu/hotplug: Fix SMT supported evaluation")
      
           and
      
        b) changing vmx_vm_init() to query the actual current SMT state --
           instead of the sysfs control value -- to determine whether the L1TF
           warning is needed.  This also requires the 'sched_smt_present'
           variable to exported, instead of 'cpu_smt_control'.
      
      Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      Reported-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
      b284909a
  4. 29 Jan, 2019 1 commit
  5. 28 Nov, 2018 1 commit
    • Thomas Gleixner's avatar
      x86/speculation: Rework SMT state change · a74cfffb
      Thomas Gleixner authored
      arch_smt_update() is only called when the sysfs SMT control knob is
      changed. This means that when SMT is enabled in the sysfs control knob the
      system is considered to have SMT active even if all siblings are offline.
      
      To allow finegrained control of the speculation mitigations, the actual SMT
      state is more interesting than the fact that siblings could be enabled.
      
      Rework the code, so arch_smt_update() is invoked from each individual CPU
      hotplug function, and simplify the update function while at it.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de
      
      a74cfffb
  6. 05 Oct, 2018 1 commit
  7. 26 Sep, 2018 1 commit
    • Jiri Kosina's avatar
      x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation · 53c613fe
      Jiri Kosina authored
      STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
      (once enabled) prevents cross-hyperthread control of decisions made by
      indirect branch predictors.
      
      Enable this feature if
      
      - the CPU is vulnerable to spectre v2
      - the CPU supports SMT and has SMT siblings online
      - spectre_v2 mitigation autoselection is enabled (default)
      
      After some previous discussion, this leaves STIBP on all the time, as wrmsr
      on crossing kernel boundary is a no-no. This could perhaps later be a bit
      more optimized (like disabling it in NOHZ, experiment with disabling it in
      idle, etc) if needed.
      
      Note that the synchronization of the mask manipulation via newly added
      spec_ctrl_mutex is currently not strictly needed, as the only updater is
      already being serialized by cpu_add_remove_lock, but let's make this a
      little bit more future-proof.
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm
      53c613fe
  8. 11 Sep, 2018 1 commit
  9. 06 Sep, 2018 2 commits
  10. 31 Aug, 2018 1 commit
  11. 14 Aug, 2018 1 commit
  12. 12 Aug, 2018 1 commit
    • Linus Torvalds's avatar
      init: rename and re-order boot_cpu_state_init() · b5b1404d
      Linus Torvalds authored
      This is purely a preparatory patch for upcoming changes during the 4.19
      merge window.
      
      We have a function called "boot_cpu_state_init()" that isn't really
      about the bootup cpu state: that is done much earlier by the similarly
      named "boot_cpu_init()" (note lack of "state" in name).
      
      This function initializes some hotplug CPU state, and needs to run after
      the percpu data has been properly initialized.  It even has a comment to
      that effect.
      
      Except it _doesn't_ actually run after the percpu data has been properly
      initialized.  On x86 it happens to do that, but on at least arm and
      arm64, the percpu base pointers are initialized by the arch-specific
      'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
      
      This had some unexpected results, and in particular we have a patch
      pending for the merge window that did the obvious cleanup of using
      'this_cpu_write()' in the cpu hotplug init code:
      
        -       per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
        +       this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
      
      which is obviously the right thing to do.  Except because of the
      ordering issue, it actually failed miserably and unexpectedly on arm64.
      
      So this just fixes the ordering, and changes the name of the function to
      be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
      hotplug state, because the core CPU state was supposed to have already
      been done earlier.
      
      Marked for stable, since the (not yet merged) patch that will show this
      problem is marked for stable.
      Reported-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: Mian Yousaf Kaukab's avatarMian Yousaf Kaukab <yousaf.kaukab@suse.com>
      Suggested-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b5b1404d
  13. 07 Aug, 2018 1 commit
    • Thomas Gleixner's avatar
      cpu/hotplug: Fix SMT supported evaluation · bc2d8d26
      Thomas Gleixner authored
      Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
      cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
      on the kernel command line as it cannot differentiate between SMT disabled
      by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
      makes the sysfs interface unusable.
      
      Rework this so that during bringup of the non boot CPUs the availability of
      SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
      'primary' thread then set the local cpu_smt_available marker and evaluate
      this explicitely right after the initial SMP bringup has finished.
      
      SMT evaulation on x86 is a trainwreck as the firmware has all the
      information _before_ booting the kernel, but there is no interface to query
      it.
      
      Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      Reported-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      bc2d8d26
  14. 30 Jul, 2018 1 commit
  15. 26 Jul, 2018 1 commit
  16. 24 Jul, 2018 1 commit
  17. 13 Jul, 2018 2 commits
  18. 09 Jul, 2018 1 commit
  19. 04 Jul, 2018 1 commit
    • Konrad Rzeszutek Wilk's avatar
      x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present · 26acfb66
      Konrad Rzeszutek Wilk authored
      If the L1TF CPU bug is present we allow the KVM module to be loaded as the
      major of users that use Linux and KVM have trusted guests and do not want a
      broken setup.
      
      Cloud vendors are the ones that are uncomfortable with CVE 2018-3620 and as
      such they are the ones that should set nosmt to one.
      
      Setting 'nosmt' means that the system administrator also needs to disable
      SMT (Hyper-threading) in the BIOS, or via the 'nosmt' command line
      parameter, or via the /sys/devices/system/cpu/smt/control. See commit
      05736e4a ("cpu/hotplug: Provide knobs to control SMT").
      
      Other mitigations are to use task affinity, cpu sets, interrupt binding,
      etc - anything to make sure that _only_ the same guests vCPUs are running
      on sibling threads.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      26acfb66
  20. 03 Jul, 2018 1 commit
  21. 02 Jul, 2018 1 commit
  22. 21 Jun, 2018 3 commits
    • Thomas Gleixner's avatar
      cpu/hotplug: Provide knobs to control SMT · 05736e4a
      Thomas Gleixner authored
      Provide a command line and a sysfs knob to control SMT.
      
      The command line options are:
      
       'nosmt':	Enumerate secondary threads, but do not online them
       		
       'nosmt=force': Ignore secondary threads completely during enumeration
       		via MP table and ACPI/MADT.
      
      The sysfs control file has the following states (read/write):
      
       'on':		 SMT is enabled. Secondary threads can be freely onlined
       'off':		 SMT is disabled. Secondary threads, even if enumerated
       		 cannot be onlined
       'forceoff':	 SMT is permanentely disabled. Writes to the control
       		 file are rejected.
       'notsupported': SMT is not supported by the CPU
      
      The command line option 'nosmt' sets the sysfs control to 'off'. This
      can be changed to 'on' to reenable SMT during runtime.
      
      The command line option 'nosmt=force' sets the sysfs control to
      'forceoff'. This cannot be changed during runtime.
      
      When SMT is 'on' and the control file is changed to 'off' then all online
      secondary threads are offlined and attempts to online a secondary thread
      later on are rejected.
      
      When SMT is 'off' and the control file is changed to 'on' then secondary
      threads can be onlined again. The 'off' -> 'on' transition does not
      automatically online the secondary threads.
      
      When the control file is set to 'forceoff', the behaviour is the same as
      setting it to 'off', but the operation is irreversible and later writes to
      the control file are rejected.
      
      When the control status is 'notsupported' then writes to the control file
      are rejected.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      05736e4a
    • Thomas Gleixner's avatar
      cpu/hotplug: Split do_cpu_down() · cc1fe215
      Thomas Gleixner authored
      Split out the inner workings of do_cpu_down() to allow reuse of that
      function for the upcoming SMT disabling mechanism.
      
      No functional change.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      cc1fe215
    • Thomas Gleixner's avatar
      cpu/hotplug: Make bringup/teardown of smp threads symmetric · c4de6569
      Thomas Gleixner authored
      The asymmetry caused a warning to trigger if the bootup was stopped in state
      CPUHP_AP_ONLINE_IDLE. The warning no longer triggers as kthread_park() can
      now be invoked on already or still parked threads. But there is still no
      reason to have this be asymmetric.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      c4de6569
  23. 15 Mar, 2018 1 commit
  24. 14 Mar, 2018 1 commit
  25. 29 Dec, 2017 1 commit
  26. 27 Dec, 2017 1 commit
  27. 06 Dec, 2017 1 commit
  28. 28 Nov, 2017 1 commit
  29. 21 Oct, 2017 1 commit
    • Thomas Gleixner's avatar
      cpu/hotplug: Reset node state after operation · 1f7c70d6
      Thomas Gleixner authored
      The recent rework of the cpu hotplug internals changed the usage of the per
      cpu state->node field, but missed to clean it up after usage.
      
      So subsequent hotplug operations use the stale pointer from a previous
      operation and hand it into the callback functions. The callbacks then
      dereference a pointer which either belongs to a different facility or
      points to freed and potentially reused memory. In either case data
      corruption and crashes are the obvious consequence.
      
      Reset the node and the last pointers in the per cpu state to NULL after the
      operation which set them has completed.
      
      Fixes: 96abb968549c ("smp/hotplug: Allow external multi-instance rollback")
      Reported-by: default avatarTvrtko Ursulin <tursulin@ursulin.net>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710211606130.3213@nanos
      1f7c70d6
  30. 25 Sep, 2017 6 commits
    • Peter Zijlstra's avatar
      smp/hotplug: Hotplug state fail injection · 1db49484
      Peter Zijlstra authored
      Add a sysfs file to one-time fail a specific state. This can be used
      to test the state rollback code paths.
      
      Something like this (hotplug-up.sh):
      
        #!/bin/bash
      
        echo 0 > /debug/sched_debug
        echo 1 > /debug/tracing/events/cpuhp/enable
      
        ALL_STATES=`cat /sys/devices/system/cpu/hotplug/states | cut -d':' -f1`
        STATES=${1:-$ALL_STATES}
      
        for state in $STATES
        do
      	  echo 0 > /sys/devices/system/cpu/cpu1/online
      	  echo 0 > /debug/tracing/trace
      	  echo Fail state: $state
      	  echo $state > /sys/devices/system/cpu/cpu1/hotplug/fail
      	  cat /sys/devices/system/cpu/cpu1/hotplug/fail
      	  echo 1 > /sys/devices/system/cpu/cpu1/online
      
      	  cat /debug/tracing/trace > hotfail-${state}.trace
      
      	  sleep 1
        done
      
      Can be used to test for all possible rollback (barring multi-instance)
      scenarios on CPU-up, CPU-down is a trivial modification of the above.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Cc: max.byungchul.park@gmail.com
      Link: https://lkml.kernel.org/r/20170920170546.972581715@infradead.org
      
      1db49484
    • Peter Zijlstra's avatar
      smp/hotplug: Differentiate the AP completion between up and down · 5ebe7742
      Peter Zijlstra authored
      With lockdep-crossrelease we get deadlock reports that span cpu-up and
      cpu-down chains. Such deadlocks cannot possibly happen because cpu-up
      and cpu-down are globally serialized.
      
        takedown_cpu()
          irq_lock_sparse()
          wait_for_completion(&st->done)
      
                                      cpuhp_thread_fun
                                        cpuhp_up_callback
                                          cpuhp_invoke_callback
                                            irq_affinity_online_cpu
                                              irq_local_spare()
                                              irq_unlock_sparse()
                                        complete(&st->done)
      
      Now that we have consistent AP state, we can trivially separate the
      AP completion between up and down using st->bringup.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: max.byungchul.park@gmail.com
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Link: https://lkml.kernel.org/r/20170920170546.872472799@infradead.org
      
      5ebe7742
    • Peter Zijlstra's avatar
      smp/hotplug: Differentiate the AP-work lockdep class between up and down · 5f4b55e1
      Peter Zijlstra authored
      With lockdep-crossrelease we get deadlock reports that span cpu-up and
      cpu-down chains. Such deadlocks cannot possibly happen because cpu-up
      and cpu-down are globally serialized.
      
        CPU0                  CPU1                    CPU2
        cpuhp_up_callbacks:   takedown_cpu:           cpuhp_thread_fun:
      
        cpuhp_state
                              irq_lock_sparse()
          irq_lock_sparse()
                              wait_for_completion()
                                                      cpuhp_state
                                                      complete()
      
      Now that we have consistent AP state, we can trivially separate the
      AP-work class between up and down using st->bringup.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: max.byungchul.park@gmail.com
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Link: https://lkml.kernel.org/r/20170920170546.922524234@infradead.org
      
      5f4b55e1
    • Peter Zijlstra's avatar
      smp/hotplug: Callback vs state-machine consistency · 724a8688
      Peter Zijlstra authored
      While the generic callback functions have an 'int' return and thus
      appear to be allowed to return error, this is not true for all states.
      
      Specifically, what used to be STARTING/DYING are ran with IRQs
      disabled from critical parts of CPU bringup/teardown and are not
      allowed to fail. Add WARNs to enforce this rule.
      
      But since some callbacks are indeed allowed to fail, we have the
      situation where a state-machine rollback encounters a failure, in this
      case we're stuck, we can't go forward and we can't go back. Also add a
      WARN for that case.
      
      AFAICT this is a fundamental 'problem' with no real obvious solution.
      We want the 'prepare' callbacks to allow failure on either up or down.
      Typically on prepare-up this would be things like -ENOMEM from
      resource allocations, and the typical usage in prepare-down would be
      something like -EBUSY to avoid CPUs being taken away.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Cc: max.byungchul.park@gmail.com
      Link: https://lkml.kernel.org/r/20170920170546.819539119@infradead.org
      
      724a8688
    • Peter Zijlstra's avatar
      smp/hotplug: Rewrite AP state machine core · 4dddfb5f
      Peter Zijlstra authored
      There is currently no explicit state change on rollback. That is,
      st->bringup, st->rollback and st->target are not consistent when doing
      the rollback.
      
      Rework the AP state handling to be more coherent. This does mean we
      have to do a second AP kick-and-wait for rollback, but since rollback
      is the slow path of a slowpath, this really should not matter.
      
      Take this opportunity to simplify the AP thread function to only run a
      single callback per invocation. This unifies the three single/up/down
      modes is supports. The looping it used to do for up/down are achieved
      by retaining should_run and relying on the main smpboot_thread_fn()
      loop.
      
      (I have most of a patch that does the same for the BP state handling,
      but that's not critical and gets a little complicated because
      CPUHP_BRINGUP_CPU does the AP handoff from a callback, which gets
      recursive @st usage, I still have de-fugly that.)
      
      [ tglx: Move cpuhp_down_callbacks() et al. into the HOTPLUG_CPU section to
        	avoid gcc complaining about unused functions. Make the HOTPLUG_CPU
        	one piece instead of having two consecutive ifdef sections of the
        	same type. ]
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Cc: max.byungchul.park@gmail.com
      Link: https://lkml.kernel.org/r/20170920170546.769658088@infradead.org
      
      4dddfb5f
    • Peter Zijlstra's avatar
      smp/hotplug: Allow external multi-instance rollback · 96abb968
      Peter Zijlstra authored
      Currently the rollback of multi-instance states is handled inside
      cpuhp_invoke_callback(). The problem is that when we want to allow an
      explicit state change for rollback, we need to return from the
      function without doing the rollback.
      
      Change cpuhp_invoke_callback() to optionally return the multi-instance
      state, such that rollback can be done from a subsequent call.
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Cc: max.byungchul.park@gmail.com
      Link: https://lkml.kernel.org/r/20170920170546.720361181@infradead.org
      
      96abb968
  31. 14 Sep, 2017 1 commit
    • Thomas Gleixner's avatar
      watchdog/hardlockup/perf: Prevent CPU hotplug deadlock · 941154bd
      Thomas Gleixner authored
      The following deadlock is possible in the watchdog hotplug code:
      
        cpus_write_lock()
          ...
            takedown_cpu()
              smpboot_park_threads()
                smpboot_park_thread()
                  kthread_park()
                    ->park() := watchdog_disable()
                      watchdog_nmi_disable()
                        perf_event_release_kernel();
                          put_event()
                            _free_event()
                              ->destroy() := hw_perf_event_destroy()
                                x86_release_hardware()
                                  release_ds_buffers()
                                    get_online_cpus()
      
      when a per cpu watchdog perf event is destroyed which drops the last
      reference to the PMU hardware. The cleanup code there invokes
      get_online_cpus() which instantly deadlocks because the hotplug percpu
      rwsem is write locked.
      
      To solve this add a deferring mechanism:
      
        cpus_write_lock()
      			   kthread_park()
      			    watchdog_nmi_disable(deferred)
      			      perf_event_disable(event);
      			      move_event_to_deferred(event);
      			   ....
        cpus_write_unlock()
        cleaup_deferred_events()
          perf_event_release_kernel()
      
      This is still properly serialized against concurrent hotplug via the
      cpu_add_remove_lock, which is held by the task which initiated the hotplug
      event.
      
      This is also used to handle event destruction when the watchdog threads are
      parked via other mechanisms than CPU hotplug.
      Analyzed-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Reported-by: b0R1sLAV's avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: Don Zickus's avatarDon Zickus <dzickus@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Link: http://lkml.kernel.org/r/20170912194146.884469246@linutronix.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      941154bd