Skip to content
  • aarzilli's avatar
    proc/native: fix race condition between Halt and process death (linux) · f32ce1b2
    aarzilli authored and Derek Parker's avatar Derek Parker committed
    If a breakpoint is hit close to process death on a thread that isn't
    the group leader the process could die while we are trying to stop it.
    
    This can be easily reproduced by having the goroutine that's executing
    main.main (which will almost always run on the thread group leader)
    wait for a second goroutine before exiting, then setting a breakpoint
    on the second goroutine and stepping through it (see TestIssue1101 in
    proc_test.go).
    
    When stepping over the return instruction of main.f the deferred
    wg.Done() call will be executed which will cause the main goroutine to
    resume and proceed to exit. Both the temporary breakpoint on wg.Done
    and the temporary breakpoint on the return address of main.f will be in
    close proximity to main.main calling os.Exit() and causing the death of
    the thread group leader.
    
    Under these circumstances the call to native.(*Thread).waitFast in
    native.(*Thread).halt can hang forever due to a bug similar to
    https://sourceware.org/bugzilla/show_bug.cgi?id=12702 (see comment in
    native.(*Thread).wait for an explanation).
    
    Replacing waitFast with a normal wait work in most circumstances,
    however, besides the performance hit, it looks like in this
    circumstances trapWait sometimes receives a spurious SIGTRAP on the
    dying group leader which would cause the subsequent call to wait in
    halt to accidentally reap the process without noting that it did exit.
    
    Instead this patch removes the call to wait from halt and instead calls
    trapWait in a loop in setCurrentBreakpoints until all threads are set
    to running=false. This is also a better fix than the workaround to
    ESRCH error while setting current breakpoints implemented in 94b50d.
    
    Fixes #1101
    f32ce1b2