• Vlastimil Babka's avatar
    mm, page_alloc: do not break __GFP_THISNODE by zonelist reset · 7810e678
    Vlastimil Babka authored
    In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for
    allocations that can ignore memory policies.  The zonelist is obtained
    from current CPU's node.  This is a problem for __GFP_THISNODE
    allocations that want to allocate on a different node, e.g.  because the
    allocating thread has been migrated to a different CPU.
    
    This has been observed to break SLAB in our 4.4-based kernel, because
    there it relies on __GFP_THISNODE working as intended.  If a slab page
    is put on wrong node's list, then further list manipulations may corrupt
    the list because page_to_nid() is used to determine which node's
    list_lock should be locked and thus we may take a wrong lock and race.
    
    Current SLAB implementation seems to be immune by luck thanks to commit
    511e3a05 ("mm/slab: make cache_grow() handle the page allocated on
    arbitrary node") but there may be others assuming that __GFP_THISNODE
    works as promised.
    
    We can fix it by simply removing the zonelist reset completely.  There
    is actually no reason to reset it, because memory policies and cpusets
    don't affect the zonelist choice in the first place.  This was different
    when commit 183f6371 ("mm: ignore mempolicies when using
    ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their
    own restricted zonelists.
    
    We might consider this for 4.17 although I don't know if there's
    anything currently broken.
    
    SLAB is currently not affected, but in kernels older than 4.7 that don't
    yet have 511e3a05 ("mm/slab: make cache_grow() handle the page
    allocated on arbitrary node") it is.  That's at least 4.4 LTS.  Older
    ones I'll have to check.
    
    So stable backports should be more important, but will have to be
    reviewed carefully, as the code went through many changes.  BTW I think
    that also the ac->preferred_zoneref reset is currently useless if we
    don't also reset ac->nodemask from a mempolicy to NULL first (which we
    probably should for the OOM victims etc?), but I would leave that for a
    separate patch.
    
    Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Fixes: 183f6371 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK")
    Acked-by: Mel Gorman's avatarMel Gorman <mgorman@techsingularity.net>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    7810e678
Name
Last commit
Last update
Documentation Loading commit data...
LICENSES Loading commit data...
arch Loading commit data...
block Loading commit data...
certs Loading commit data...
crypto Loading commit data...
drivers Loading commit data...
firmware Loading commit data...
fs Loading commit data...
include Loading commit data...
init Loading commit data...
ipc Loading commit data...
kernel Loading commit data...
lib Loading commit data...
mm Loading commit data...
net Loading commit data...
samples Loading commit data...
scripts Loading commit data...
security Loading commit data...
sound Loading commit data...
tools Loading commit data...
usr Loading commit data...
virt Loading commit data...
.clang-format Loading commit data...
.cocciconfig Loading commit data...
.get_maintainer.ignore Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.mailmap Loading commit data...
COPYING Loading commit data...
CREDITS Loading commit data...
Kbuild Loading commit data...
Kconfig Loading commit data...
MAINTAINERS Loading commit data...
Makefile Loading commit data...
README Loading commit data...