Skip to content
  • Jeff King's avatar
    add UNLEAK annotation for reducing leak false positives · 0e5bba53
    Jeff King authored and Junio C Hamano's avatar Junio C Hamano committed
    
    
    It's a common pattern in git commands to allocate some
    memory that should last for the lifetime of the program and
    then not bother to free it, relying on the OS to throw it
    away.
    
    This keeps the code simple, and it's fast (we don't waste
    time traversing structures or calling free at the end of the
    program). But it also triggers warnings from memory-leak
    checkers like valgrind or LSAN. They know that the memory
    was still allocated at program exit, but they don't know
    _when_ the leaked memory stopped being useful. If it was
    early in the program, then it's probably a real and
    important leak. But if it was used right up until program
    exit, it's not an interesting leak and we'd like to suppress
    it so that we can see the real leaks.
    
    This patch introduces an UNLEAK() macro that lets us do so.
    To understand its design, let's first look at some of the
    alternatives.
    
    Unfortunately the suppression systems offered by
    leak-checking tools don't quite do what we want. A
    leak-checker basically knows two things:
    
      1. Which blocks were allocated via malloc, and the
         callstack during the allocation.
    
      2. Which blocks were left un-freed at the end of the
         program (and which are unreachable, but more on that
         later).
    
    Their suppressions work by mentioning the function or
    callstack of a particular allocation, and marking it as OK
    to leak.  So imagine you have code like this:
    
      int cmd_foo(...)
      {
    	/* this allocates some memory */
    	char *p = some_function();
    	printf("%s", p);
    	return 0;
      }
    
    You can say "ignore allocations from some_function(),
    they're not leaks". But that's not right. That function may
    be called elsewhere, too, and we would potentially want to
    know about those leaks.
    
    So you can say "ignore the callstack when main calls
    some_function".  That works, but your annotations are
    brittle. In this case it's only two functions, but you can
    imagine that the actual allocation is much deeper. If any of
    the intermediate code changes, you have to update the
    suppression.
    
    What we _really_ want to say is that "the value assigned to
    p at the end of the function is not a real leak". But
    leak-checkers can't understand that; they don't know about
    "p" in the first place.
    
    However, we can do something a little bit tricky if we make
    some assumptions about how leak-checkers work. They
    generally don't just report all un-freed blocks. That would
    report even globals which are still accessible when the
    leak-check is run.  Instead they take some set of memory
    (like BSS) as a root and mark it as "reachable". Then they
    scan the reachable blocks for anything that looks like a
    pointer to a malloc'd block, and consider that block
    reachable. And then they scan those blocks, and so on,
    transitively marking anything reachable from a global as
    "not leaked" (or at least leaked in a different category).
    
    So we can mark the value of "p" as reachable by putting it
    into a variable with program lifetime. One way to do that is
    to just mark "p" as static. But that actually affects the
    run-time behavior if the function is called twice (you
    aren't likely to call main() twice, but some of our cmd_*()
    functions are called from other commands).
    
    Instead, we can trick the leak-checker by putting the value
    into _any_ reachable bytes. This patch keeps a global
    linked-list of bytes copied from "unleaked" variables. That
    list is reachable even at program exit, which confers
    recursive reachability on whatever values we unleak.
    
    In other words, you can do:
    
      int cmd_foo(...)
      {
    	char *p = some_function();
    	printf("%s", p);
    	UNLEAK(p);
    	return 0;
      }
    
    to annotate "p" and suppress the leak report.
    
    But wait, couldn't we just say "free(p)"? In this toy
    example, yes. But UNLEAK()'s byte-copying strategy has
    several advantages over actually freeing the memory:
    
      1. It's recursive across structures. In many cases our "p"
         is not just a pointer, but a complex struct whose
         fields may have been allocated by a sub-function. And
         in some cases (e.g., dir_struct) we don't even have a
         function which knows how to free all of the struct
         members.
    
         By marking the struct itself as reachable, that confers
         reachability on any pointers it contains (including those
         found in embedded structs, or reachable by walking
         heap blocks recursively.
    
      2. It works on cases where we're not sure if the value is
         allocated or not. For example:
    
           char *p = argc > 1 ? argv[1] : some_function();
    
         It's safe to use UNLEAK(p) here, because it's not
         freeing any memory. In the case that we're pointing to
         argv here, the reachability checker will just ignore
         our bytes.
    
      3. Likewise, it works even if the variable has _already_
         been freed. We're just copying the pointer bytes. If
         the block has been freed, the leak-checker will skip
         over those bytes as uninteresting.
    
      4. Because it's not actually freeing memory, you can
         UNLEAK() before we are finished accessing the variable.
         This is helpful in cases like this:
    
           char *p = some_function();
           return another_function(p);
    
         Writing this with free() requires:
    
           int ret;
           char *p = some_function();
           ret = another_function(p);
           free(p);
           return ret;
    
         But with unleak we can just write:
    
           char *p = some_function();
           UNLEAK(p);
           return another_function(p);
    
    This patch adds the UNLEAK() macro and enables it
    automatically when Git is compiled with SANITIZE=leak.  In
    normal builds it's a noop, so we pay no runtime cost.
    
    It also adds some UNLEAK() annotations to show off how the
    feature works. On top of other recent leak fixes, these are
    enough to get t0000 and t0001 to pass when compiled with
    LSAN.
    
    Note the case in commit.c which actually converts a
    strbuf_release() into an UNLEAK. This code was already
    non-leaky, but the free didn't do anything useful, since
    we're exiting. Converting it to an annotation means that
    non-leak-checking builds pay no runtime cost. The cost is
    minimal enough that it's probably not worth going on a
    crusade to convert these kinds of frees to UNLEAKS. I did it
    here for consistency with the "sb" leak (though it would
    have been equally correct to go the other way, and turn them
    both into strbuf_release() calls).
    
    Signed-off-by: default avatarJeff King <peff@peff.net>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    0e5bba53