Skip to content
  • Jeff King's avatar
    combine-diff: handle --find-object in multitree code path · 957876f1
    Jeff King authored and Junio C Hamano's avatar Junio C Hamano committed
    
    
    When doing combined diffs, we have two possible code paths:
    
      - a slower one which independently diffs against each parent, applies
        any filters, and then intersects the resulting paths
    
      - a faster one which walks all trees simultaneously
    
    When the diff options specify that we must do certain filters, like
    pickaxe, then we always use the slow path, since the pickaxe code only
    knows how to handle filepairs, not the n-parent entries generated for
    combined diffs.
    
    But there are two problems with the slow path:
    
     1. It's slow. Running:
    
          git rev-list HEAD | git diff-tree --stdin -r -c
    
        in git.git takes ~3s on my machine. But adding "--find-object" to
        that increases it to ~6s, even though find-object itself should
        incur only a few extra oid comparisons. On linux.git, it's even
        worse: 35s versus 215s.
    
     2. It doesn't catch all cases where a particular path is interesting.
        Consider a merge with parent blobs X and Y for a particular path,
        and end result Z. That should be interesting according to "-c",
        because the result doesn't match either parent. And it should be
        interesting even with "--find-object=X", because "X" went away in
        the merge.
    
        But because we perform each pairwise diff independently, this
        confuses the intersection code. The change from X to Z is still
        interesting according to --find-object. But in the other parent we
        went from Y to Z, so the diff appears empty! That causes the
        intersection code to think that parent didn't change the path, and
        thus it's not interesting for "-c".
    
    This patch fixes both by implementing --find-object for the multitree
    code. It's a bit unfortunate that we have to duplicate some logic from
    diffcore-pickaxe, but this is the best we can do for now. In an ideal
    world, all of the diffcore code would stop thinking about filepairs and
    start thinking about n-parent sets, and we could use the multitree walk
    with all of it.
    
    Until then, there are some leftover warts:
    
      - other pickaxe operations, like -S or -G, still suffer from both
        problems. These would be hard to adapt because they rely on having
        a diff_filespec() for each path to look at content. And we'd need to
        define what an n-way "change" means in each case (probably easy for
        "-S", which can compare counts, but not so clear for -G, which is
        about grepping diffs).
    
      - other options besides --find-object may cause us to use the slow
        pairwise path, in which case we'll go back to producing a different
        (wrong) answer for the X/Y/Z case above.
    
    We may be able to hack around these, but I think the ultimate solution
    will be a larger rewrite of the diffcore code. For now, this patch
    improves one specific case but leaves the rest.
    
    Signed-off-by: default avatarJeff King <peff@peff.net>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    957876f1