Skip to content
  • Jeff King's avatar
    revision: use a prio_queue to hold rewritten parents · 8320b1db
    Jeff King authored and Junio C Hamano's avatar Junio C Hamano committed
    This patch fixes a quadratic list insertion in rewrite_one() when
    pathspec limiting is combined with --parents. What happens is something
    like this:
    
      1. We see that some commit X touches the path, so we try to rewrite
         its parents.
    
      2. rewrite_one() loops forever, rewriting parents, until it finds a
         relevant parent (or hits the root and decides there are none). The
         heavy lifting is done by process_parent(), which uses
         try_to_simplify_commit() to drop parents.
    
      3. process_parent() puts any intermediate parents into the
         &revs->commits list, inserting by commit date as usual.
    
    So if commit X is recent, and then there's a large chunk of history that
    doesn't touch the path, we may add a lot of commits to &revs->commits.
    And insertion by commit date is O(n) in the worst case, making the whole
    thing quadratic.
    
    We tried to deal with this long ago in fce87ae5
    
     (Fix quadratic
    performance in rewrite_one., 2008-07-12). In that scheme, we cache the
    oldest commit in the list; if the new commit to be added is older, we
    can start our linear traversal there. This often works well in practice
    because parents are older than their descendants, and thus we tend to
    add older and older commits as we traverse.
    
    But this isn't guaranteed, and in fact there's a simple case where it is
    not: merges. Imagine we look at the first parent of a merge and see a
    very old commit (let's say 3 years old). And on the second parent, as we
    go back 3 years in history, we might have many commits. That one
    first-parent commit has polluted our oldest-commit cache; it will remain
    the oldest while we traverse a huge chunk of history, during which we
    have to fall back to the slow, linear method of adding to the list.
    
    Naively, one might imagine that instead of caching the oldest commit,
    we'd start at the last-added one. But that just makes some cases faster
    while making others slower (and indeed, while it made a real-world test
    case much faster, it does quite poorly in the perf test include here).
    Fundamentally, these are just heuristics; our worst case is still
    quadratic, and some cases will approach that.
    
    Instead, let's use a data structure with better worst-case performance.
    Swapping out revs->commits for something else would have repercussions
    all over the code base, but we can take advantage of one fact: for the
    rewrite_one() case, nobody actually needs to see those commits in
    revs->commits until we've finished generating the whole list.
    
    That leaves us with two obvious options:
    
      1. We can generate the list _unordered_, which should be O(n), and
         then sort it afterwards, which would be O(n log n) total. This is
         "sort-after" below.
    
      2. We can insert the commits into a separate data structure, like a
         priority queue. This is "prio-queue" below.
    
    I expected that sort-after would be the fastest (since it saves us the
    extra step of copying the items into the linked list), but surprisingly
    the prio-queue seems to be a bit faster.
    
    Here are timings for the new p0001.6 for all three techniques across a
    few repositories, as compared to master:
    
    master              cache-last                sort-after              prio-queue
    --------------------------------------------------------------------------------------------
    GIT_PERF_REPO=git.git
    0.52(0.50+0.02)      0.53(0.51+0.02)  +1.9%   0.37(0.33+0.03) -28.8%  0.37(0.32+0.04) -28.8%
    
    GIT_PERF_REPO=linux.git
    20.81(20.74+0.07)   20.31(20.24+0.07) -2.4%   0.94(0.86+0.07) -95.5%  0.91(0.82+0.09) -95.6%
    
    GIT_PERF_REPO=llvm-project.git
    83.67(83.57+0.09)    4.23(4.15+0.08) -94.9%   3.21(3.15+0.06) -96.2%  2.98(2.91+0.07) -96.4%
    
    A few items to note:
    
      - the cache-list tweak does improve the bad case for llvm-project.git
        that started my digging into this problem. But it performs terribly
        on linux.git, barely helping at all.
    
      - the sort-after and prio-queue techniques work well. They approach
        the timing for running without --parents at all, which is what you'd
        expect (see below for more data).
    
      - prio-queue just barely outperforms sort-after. As I said, I'm not
        really sure why this is the case, but it is. You can see it even
        more prominently in this real-world case on llvm-project.git:
    
          git rev-list --parents 07ef786652e7 -- llvm/test/CodeGen/Generic/bswap.ll
    
        where prio-queue routinely outperforms sort-after by about 7%. One
        guess is that the prio-queue may just be more efficient because it
        uses a compact array.
    
    There are three new perf tests:
    
      - "rev-list --parents" gives us a baseline for running with --parents.
        This isn't sped up meaningfully here, because the bad case is
        triggered only with simplification. But it's good to make sure we
        don't screw it up (now, or in the future).
    
      - "rev-list -- dummy" gives us a baseline for just traversing with
        pathspec limiting. This gives a lower bound for the next test (and
        it's also a good thing for us to be checking in general for
        regressions, since we don't seem to have any existing tests).
    
      - "rev-list --parents -- dummy" shows off the problem (and our fix)
    
    Here are the timings for those three on llvm-project.git, before and
    after the fix:
    
    Test                                 master              prio-queue
    ------------------------------------------------------------------------------
    0001.3: rev-list --parents           2.24(2.12+0.12)     2.22(2.11+0.11) -0.9%
    0001.5: rev-list -- dummy            2.89(2.82+0.07)     2.92(2.89+0.03) +1.0%
    0001.6: rev-list --parents -- dummy  83.67(83.57+0.09)   2.98(2.91+0.07) -96.4%
    
    Changes in the first two are basically noise, and you can see we
    approach our lower bound in the final one.
    
    Note that we can't fully get rid of the list argument from
    process_parents(). Other callers do have lists, and it would be hard to
    convert them. They also don't seem to have this problem (probably
    because they actually remove items from the list as they loop, meaning
    it doesn't grow so large in the first place). So this basically just
    drops the "cache_ptr" parameter (which was used only by the one caller
    we're fixing here) and replaces it with a prio_queue. Callers are free
    to use either data structure, depending on what they're prepared to
    handle.
    
    Reported-by: default avatarBjörn Pettersson A <bjorn.a.pettersson@ericsson.com>
    Signed-off-by: default avatarJeff King <peff@peff.net>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    8320b1db