• Linus Torvalds's avatar
    sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f654 · 7a400b91
    Linus Torvalds authored
    commit c40f7d74 upstream.
    
    Zhipeng Xie, Xie XiuQi and Sargun Dhillon reported lockups in the
    scheduler under high loads, starting at around the v4.18 time frame,
    and Zhipeng Xie tracked it down to bugs in the rq->leaf_cfs_rq_list
    manipulation.
    
    Do a (manual) revert of:
    
      a9e7f654 ("sched/fair: Fix O(nr_cgroups) in load balance path")
    
    It turns out that the list_del_leaf_cfs_rq() introduced by this commit
    is a surprising property that was not considered in followup commits
    such as:
    
      9c2791f9 ("sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_list")
    
    As Vincent Guittot explains:
    
     "I think that there is a bigger problem with commit a9e7f654 and
      cfs_rq throttling:
    
      Let take the example of the following topology TG2 --> TG1 --> root:
    
       1) The 1st time a task is enqueued, we will add TG2 cfs_rq then TG1
          cfs_rq to leaf_cfs_rq_list and we are sure to do the whole branch in
          one path because it has never been used and can't be throttled so
          tmp_alone_branch will point to leaf_cfs_rq_list at the end.
    
       2) Then TG1 is throttled
    
       3) and we add TG3 as a new child of TG1.
    
       4) The 1st enqueue of a task on TG3 will add TG3 cfs_rq just before TG1
          cfs_rq and tmp_alone_branch will stay  on rq->leaf_cfs_rq_list.
    
      With commit a9e7f654, we can del a cfs_rq from rq->leaf_cfs_rq_list.
      So if the load of TG1 cfs_rq becomes NULL before step 2) above, TG1
      cfs_rq is removed from the list.
      Then at step 4), TG3 cfs_rq is added at the beginning of rq->leaf_cfs_rq_list
      but tmp_alone_branch still points to TG3 cfs_rq because its throttled
      parent can't be enqueued when the lock is released.
      tmp_alone_branch doesn't point to rq->leaf_cfs_rq_list whereas it should.
    
      So if TG3 cfs_rq is removed or destroyed before tmp_alone_branch
      points on another TG cfs_rq, the next TG cfs_rq that will be added,
      will be linked outside rq->leaf_cfs_rq_list - which is bad.
    
      In addition, we can break the ordering of the cfs_rq in
      rq->leaf_cfs_rq_list but this ordering is used to update and
      propagate the update from leaf down to root."
    
    Instead of trying to work through all these cases and trying to reproduce
    the very high loads that produced the lockup to begin with, simplify
    the code temporarily by reverting a9e7f654 - which change was clearly
    not thought through completely.
    
    This (hopefully) gives us a kernel that doesn't lock up so people
    can continue to enjoy their holidays without worrying about regressions. ;-)
    
    [ mingo: Wrote changelog, fixed weird spelling in code comment while at it. ]
    Analyzed-by: default avatarXie XiuQi <xiexiuqi@huawei.com>
    Analyzed-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
    Reported-by: default avatarZhipeng Xie <xiezhipeng1@huawei.com>
    Reported-by: Sargun Dhillon's avatarSargun Dhillon <sargun@sargun.me>
    Reported-by: default avatarXie XiuQi <xiexiuqi@huawei.com>
    Tested-by: default avatarZhipeng Xie <xiezhipeng1@huawei.com>
    Tested-by: Sargun Dhillon's avatarSargun Dhillon <sargun@sargun.me>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Acked-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
    Cc: <stable@vger.kernel.org> # v4.13+
    Cc: Bin Li <huawei.libin@huawei.com>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Fixes: a9e7f654 ("sched/fair: Fix O(nr_cgroups) in load balance path")
    Link: http://lkml.kernel.org/r/1545879866-27809-1-git-send-email-xiexiuqi@huawei.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    7a400b91
Name
Last commit
Last update
Documentation Loading commit data...
LICENSES Loading commit data...
arch Loading commit data...
block Loading commit data...
certs Loading commit data...
crypto Loading commit data...
drivers Loading commit data...
firmware Loading commit data...
fs Loading commit data...
include Loading commit data...
init Loading commit data...
ipc Loading commit data...
kernel Loading commit data...
lib Loading commit data...
mm Loading commit data...
net Loading commit data...
samples Loading commit data...
scripts Loading commit data...
security Loading commit data...
sound Loading commit data...
tools Loading commit data...
usr Loading commit data...
virt Loading commit data...
.clang-format Loading commit data...
.cocciconfig Loading commit data...
.get_maintainer.ignore Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.mailmap Loading commit data...
COPYING Loading commit data...
CREDITS Loading commit data...
Kbuild Loading commit data...
Kconfig Loading commit data...
MAINTAINERS Loading commit data...
Makefile Loading commit data...
README Loading commit data...