Skip to content
  • Ævar Arnfjörð Bjarmason's avatar
    die(): stop hiding errors due to overzealous recursion guard · 2d3c02f5
    Ævar Arnfjörð Bjarmason authored and Junio C Hamano's avatar Junio C Hamano committed
    Change the recursion limit for the default die routine from a *very*
    low 1 to 1024. This ensures that infinite recursions are broken, but
    doesn't lose the meaningful error messages under threaded execution
    where threads concurrently start to die.
    
    The intent of the existing code, as explained in commit
    cd163d4b ("usage.c: detect recursion in die routines and bail out
    immediately", 2012-11-14), is to break infinite recursion in cases
    where the die routine itself calls die(), and would thus infinitely
    recurse.
    
    However, doing that very aggressively by immediately printing out
    "recursion detected in die handler" if we've already called die() once
    means that threaded invocations of git can end up only printing out
    the "recursion detected" error, while hiding the meaningful error.
    
    An example of this is running a threaded grep which dies on execution
    against pretty much any repo, git.git will do:
    
        git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$'
    
    With the current version of git this will print some combination of
    multiple PCRE failures that caused the abort and multiple "recursion
    detected", some invocations will print out multiple "recursion
    detected" errors with no PCRE error at all!
    
    Before this change, running the above grep command 1000 times against
    git.git[1] and taking the top 20 results will on my system yield the
    following distribution of actual errors ("E") and recursion
    errors ("R"):
    
        322 E R
        306 E
        116 E R R
         65 R R
         54 R E
         49 E E
         44 R
         15 E R R R
          9 R R R
          7 R E R
          5 R R E
          3 E R R R R
          2 E E R
          1 R R R R
          1 R R R E
          1 R E R R
    
    The exact results are obviously random and system-dependent, but this
    shows the race condition in this code. Some small part of the time
    we're about to print out the actual error ("E") but another thread's
    recursion error beats us to it, and sometimes we print out nothing but
    the recursion error.
    
    With this change we get, now with "W" to mean the new warning being
    emitted indicating that we've called die() many times:
    
        502 E
        160 E W E
        120 E E
         53 E W
         35 E W E E
         34 W E E
         29 W E E E
         16 E E W
         16 E E E
         11 W E E E E
          7 E E W E
          4 W E
          3 W W E E
          2 E W E E E
          1 W W E
          1 W E W E
          1 E W W E E E
          1 E W W E E
          1 E W W E
          1 E W E E W
    
    Which still sucks a bit, due to a still present race-condition in this
    code we're sometimes going to print out several errors still, or
    several warnings, or two duplicate errors without the warning.
    
    But we will never have a case where we completely hide the actual
    error as we do now.
    
    Now, git-grep could make use of the pluggable error facility added in
    commit c19a490e
    
     ("usage: allow pluggable die-recursion checks",
    2013-04-16). There's other threaded code that calls set_die_routine()
    or set_die_is_recursing_routine().
    
    But this is about fixing the general die() behavior with threading
    when we don't have such a custom routine yet. Right now the common
    case is not an infinite recursion in the handler, but us losing error
    messages by default because we're overly paranoid about our recursion
    check.
    
    So let's just set the recursion limit to a number higher than the
    number of threads we're ever likely to spawn. Now we won't lose
    errors, and if we have a recursing die handler we'll still die within
    microseconds.
    
    There are race conditions in this code itself, in particular the
    "dying" variable is not thread mutexed, so we e.g. won't be dying at
    exactly 1024, or for that matter even be able to accurately test
    "dying == 2", see the cases where we print out more than one "W"
    above.
    
    But that doesn't really matter, for the recursion guard we just need
    to die "soon", not at exactly 1024 calls, and for printing the correct
    error and only one warning most of the time in the face of threaded
    death this is good enough and a net improvement on the current code.
    
    1. for i in {1..1000}; do git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: p.*/E/; s/^warning.*/W/' | tr '\n' ' '; echo; done | sort | uniq -c | sort -nr | head -n 20
    
    Signed-off-by: default avatarÆvar Arnfjörð Bjarmason <avarab@gmail.com>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    2d3c02f5