Skip to content
  • Jeff King's avatar
    sha1_file: drop experimental GIT_USE_LOOKUP search · f1068efe
    Jeff King authored and Junio C Hamano's avatar Junio C Hamano committed
    Long ago in 628522ec
    
     (sha1-lookup: more memory efficient
    search in sorted list of SHA-1, 2007-12-29) we added
    sha1_entry_pos(), a binary search that uses the uniform
    distribution of sha1s to scale the selection of mid-points.
    As this was a performance experiment, we tied it to the
    GIT_USE_LOOKUP environment variable and never enabled it by
    default.
    
    This code was successful in reducing the number of steps in
    each search. But the overhead of the scaling ends up making
    it slower when the cache is warm. Here are best-of-five
    timings for running rev-list on linux.git, which will have
    to look up every object:
    
      $ time git rev-list --objects --all >/dev/null
      real	0m35.357s
      user	0m35.016s
      sys	0m0.340s
    
      $ time GIT_USE_LOOKUP=1 git rev-list --objects --all >/dev/null
      real	0m37.364s
      user	0m37.045s
      sys	0m0.316s
    
    The USE_LOOKUP version might have more benefit on a cold
    cache, as the time to fault in each page would dominate. But
    that would be for a single lookup. In practice, most
    operations tend to look up many objects, and the whole pack
    .idx will end up warm.
    
    It's possible that the code could be better optimized to
    compete with a naive binary search for the warm-cache case,
    and we could have the best of both worlds. But over the
    years nobody has done so, and this is largely dead code that
    is rarely run outside of the test suite. Let's drop it in
    the name of simplicity.
    
    This lets us remove sha1_entry_pos() entirely, as the .idx
    lookup code was the only caller.  Note that sha1-lookup.c
    still contains sha1_pos(), which differs from
    sha1_entry_pos() in two ways:
    
      - it has a different interface; it uses a function pointer
        to access sha1 entries rather than a size/offset pair
        describing the table's memory layout
    
      - it only scales the initial selection of "mi", rather
        than each iteration of the search
    
    We can't get rid of this function, as it's called from
    several places. It may be that we could replace it with a
    simple binary search, but that's out of scope for this patch
    (and would need benchmarking).
    
    Signed-off-by: default avatarJeff King <peff@peff.net>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    f1068efe