Skip to content
  • Alessandro Arzilli's avatar
    proc: workarounds for runtime.clone (#1470) · 520d7924
    Alessandro Arzilli authored and Derek Parker's avatar Derek Parker committed
    runtime.clone (on some operating systems?) work similarly to fork:
    when a thread calls runtime.clone a new thread is created. For a
    short period of time both the parent thread and the child thread
    appear to be running the same goroutine, until the child thread
    adjusts its TLS to point to the correct goroutine.
    
    This means that proc.GetG for a thread that's currently running
    'runtime.clone' could be wrong and, consequently, the field
    proc.(G).thread of a G struct returned by GoroutinesInfo could be
    also wrong. And, finally, that FindGoroutine could sometimes return
    a *G with a bad associated thread if the goroutine of interest
    recently called 'runtime.clone'.
    
    To work around this problem this commit makes two changes:
    
    1. proc.GetG will return nil for all threads executing runtime.clone.
    2. FindGoroutine will return the selected goroutine as long as the
       ID matches the one requested.
    
    Change (1) takes care of the 'runtime.clone' problem. If we stop
    the target process shortly after a thread executed the SYSCALL
    instruction in 'runtime.clone' there are three possibilities:
    
    a. Both the parent thread and the child thread are stopped inside
    'runtime.clone'. In this case the state we report is slightly
    incorrect, because both threads will be reported as not running any
    goroutine when we do know which goorutine one of them (the parent)
    is running. This doesn't actually matter since runtime.clone is
    always called on the system stack and therefore the goroutine in
    runtime.allgs will have the correct location.
    
    b. The child thread managed to exit 'runtime.clone' but the parent
    thread didn't. This is similar to (a) but in this case GetG on the
    child thread will return the correct goroutine. GetG on the parent
    thread will still return (incorrectly) nil but this doesn't matter
    for the samer reason as described in (a).
    
    c. The parent thread managed to exit 'runtime.clone' but the child
    thread didn't. In this case GetG will return the correct goroutine
    both for the parent thread (because it's not executing runtime.clone)
    and the child thread.
    
    Change (2) means that even if a thread has a completely nonsensical
    TLS (for example because it's set through cgo) evaluating a variable
    with a valid GoroutineID will still work as long as it's the current
    goroutine (which is the most common case). This change also doubles
    as an optimization for FindGoroutine.
    
    Fixes #1469
    520d7924