Skip to content

tick/rcu: fix NOHZ tick-stop when performing DLPAR proc remove on ppc64le [P10]

Desnes Nunes requested to merge desnesn/centos-stream-9:rh2059555 into main

BUGZILLA

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2059555

UPSTREAM STATUS

Upstream Status: Patches have been accepted on kernel/git/next/linux-next.git

CONFLICTS

None

BUILD INFORMATION

Build Info: http://brewweb.engineering.redhat.com/brew/taskinfo?taskID=44049612

TESTING

The patched kernel was used to run a DLPAR proc remove operation, as well changes on SMT configurations. During both tests the following error messages weren't seen on the lpars console:

...
[  136.161651] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
[  136.541506] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
[  136.891360] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
...

DESCRIPTION

A NOHZ tick-stop error has been observed when executing DLPAR proc remove operations, as well as changes of SMT configuration on powerpc systems.

This is happening mostly because RHEL code still has the CONFIG_RCU_FAST_NO_HZ code, which had a safe-conduct of rasing RCU_SOFTIRQ to prevent stopping the tick on idle paths.

Thus, the CONFIG_RCU_FAST_NO_HZ code is being been removed on this series, since now the RCU_SOFTIRQ vector is expected to be raised only from sane places.

Signed-off-by: Desnes A. Nunes do Rosario drosario@redhat.com

Merge request reports