Skip to content
  • Eric W. Biederman's avatar
    pidns: guarantee that the pidns init will be the last pidns process reaped · 6347e900
    Eric W. Biederman authored
    
    
    Today we have a twofold bug.  Sometimes release_task on pid == 1 in a pid
    namespace can run before other processes in a pid namespace have had
    release task called.  With the result that pid_ns_release_proc can be
    called before the last proc_flus_task() is done using upid->ns->proc_mnt,
    resulting in the use of a stale pointer.  This same set of circumstances
    can lead to waitpid(...) returning for a processes started with
    clone(CLONE_NEWPID) before the every process in the pid namespace has
    actually exited.
    
    To fix this modify zap_pid_ns_processess wait until all other processes in
    the pid namespace have exited, even EXIT_DEAD zombies.
    
    The delay_group_leader and related tests ensure that the thread gruop
    leader will be the last thread of a process group to be reaped, or to
    become EXIT_DEAD and self reap.  With the change to zap_pid_ns_processes
    we get the guarantee that pid == 1 in a pid namespace will be the last
    task that release_task is called on.
    
    With pid == 1 being the last task to pass through release_task
    pid_ns_release_proc can no longer be called too early nor can wait return
    before all of the EXIT_DEAD tasks in a pid namespace have exited.
    
    Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
    Signed-off-by: Oleg Nesterov's avatarOleg Nesterov <oleg@redhat.com>
    Cc: Louis Rilling <louis.rilling@kerlabs.com>
    Cc: Mike Galbraith <efault@gmx.de>
    Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
    Tested-by: default avatarAndrew Wagin <avagin@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    6347e900