
Kirill Smelkov authored
Previously diff_tree(), which is now named ll_diff_tree_sha1(), was generating diff_filepair(s) for two trees t1 and t2, and that was usually used for a commit as t1=HEAD~, and t2=HEAD  i.e. to see changes a commit introduces. In Git, however, we have fundamentally built flexibility in that a commit can have many parents  1 for a plain commit, 2 for a simple merge, but also more than 2 for merging several heads at once. For merges there is a so called combinediff, which shows diff, a merge introduces by itself, omitting changes done by any parent. That works through first finding paths, that are different to all parents, and then showing generalized diff, with separate columns for +/ for each parent. The code lives in combinediff.c . There is an impedance mismatch, however, in that a commit could generally have any number of parents, and that while diffing trees, we divide cases for 2tree diffs and morethan2tree diffs. I mean there is no special casing for multiple parents commits in e.g. revisionwalker . That impedance mismatch *hurts* *performance* *badly* for generating combined diffs  in "combinediff: optimize combine_diff_path sets intersection" I've already removed some slowness from it, but from the timings provided there, it could be seen, that combined diffs still cost more than an order of magnitude more cpu time, compared to diff for usual commits, and that would only be an optimistic estimate, if we take into account that for e.g. linux.git there is only one merge for several dozens of plain commits. That slowness comes from the fact that currently, while generating combined diff, a lot of time is spent computing diff(commit,commit^2) just to only then intersect that huge diff to almost small set of files from diff(commit,commit^1). That's because at present, to compute combinediff, for first finding paths, that "every parent touches", we use the following combinediff property/definition: D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn) (w.r.t. paths) where D(A,P1...Pn) is combined diff between commit A, and parents Pi and D(A,Pi) is usual twotree diff Pi..A So if any of that D(A,Pi) is huge, tracting 1 nparent combinediff as n 1parent diffs and intersecting results will be slow. And usually, for linux.git and other topicbased workflows, that D(A,P2) is huge, because, if mergebase of A and P2, is several dozens of merges (from A, via first parent) below, that D(A,P2) will be diffing sum of merges from several subsystems to 1 subsystem. The solution is to avoid computing n 1parent diffs, and to find changedtoallparents paths via scanning A's and all Pi's trees simultaneously, at each step comparing their entries, and based on that comparison, populate paths result, and deduce we could *skip* *recursing* into subdirectories, if at least for 1 parent, sha1 of that dir tree is the same as in A. That would save us from doing significant amount of needless work. Such approach is very similar to what diff_tree() does, only there we deal with scanning only 2 trees simultaneously, and for n+1 tree, the logic is a bit more complex: D(T,P1...Pn) calculation scheme  D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn) (regarding resulting paths set) D(T,Pj)  diff between T..Pj D(T,P1...Pn)  combined diff from T to parents P1,...,Pn We start from all trees, which are sorted, and compare their entries in lockstep: T P1 Pn    t p1 pn   ...  imin = argmin(p1...pn)          . .  .  . . . . . . at any time there could be 3 cases: 1) t < p[imin]; 2) t > p[imin]; 3) t = p[imin]. Schematic deduction of what every case means, and what to do, follows: 1) t < p[imin] > ∀j t ∉ Pj > "+t" ∈ D(T,Pj) > D += "+t"; t↓ 2) t > p[imin] 2.1) ∃j: pj > p[imin] > "p[imin]" ∉ D(T,Pj) > D += ø; ∀ pi=p[imin] pi↓ 2.2) ∀i pi = p[imin] > pi ∉ T > "pi" ∈ D(T,Pi) > D += "p[imin]"; ∀i pi↓ 3) t = p[imin] 3.1) ∃j: pj > p[imin] > "+t" ∈ D(T,Pj) > only pi=p[imin] remains to investigate 3.2) pi = p[imin] > investigate δ(t,pi)   v 3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin]  if all != ø > ⎧δ(t,pi)  if pi=p[imin] > D += ⎨ ⎩"+t"  if pi>p[imin] in any case t↓ ∀ pi=p[imin] pi↓ ~ For comparison, here is how diff_tree() works: D(A,B) calculation scheme  A B   a b a < b > a ∉ B > D(A,B) += +a a↓   a > b > b ∉ A > D(A,B) += b b↓     a = b > investigate δ(a,b) a↓ b↓   . . . . . . ~~~~~~~~ This patch generalizes diff treewalker to work with arbitrary number of parents as described above  i.e. now there is a resulting tree t, and some parents trees tp[i] i=[0..nparent). The generalization builds on the fact that usual diff D(A,B) is by definition the same as combined diff D(A,[B]), so if we could rework the code for common case and make it be not slower for nparent=1 case, usual diff(t1,t2) generation will not be slower, and multiparent diff treewalker would greatly benefit generating combinediff. What we do is as follows: 1) diff treewalker ll_diff_tree_sha1() is internally reworked to be a paths generator (new name diff_tree_paths()), with each generated path being `struct combine_diff_path` with info for path, new sha1,mode and for every parent which sha1,mode it was in it. 2) From that info, we can still generate usual diff queue with struct diff_filepairs, via "exporting" generated combine_diff_path, if we know we run for nparent=1 case. (see emit_diff() which is now named emit_diff_first_parent_only()) 3) In order for diff_can_quit_early(), which checks DIFF_OPT_TST(opt, HAS_CHANGES)) to work, that exporting have to be happening not in bulk, but incrementally, one diff path at a time. For such consumers, there is a new callback in diff_options introduced: >pathchange(opt, struct combine_diff_path *) which, if set to !NULL, is called for every generated path. (see new compat ll_diff_tree_sha1() wrapper around new paths generator for setup) 4) The paths generation itself, is reworked from previous ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation scheme" provided above: On the start we allocate [nparent] arrays in place what was earlier just for one parent tree. then we just generalize loops, and comparison according to the algorithm. Some notes(*): 1) alloca(), for small arrays, is used for "runs not slower for nparent=1 case than before" goal  if we change it to xmalloc()/free() the timings get ~1% worse. For alloca() we use justintroduced xalloca/xalloca_free compatibility wrappers, so it should not be a portability problem. 2) For every parent tree, we need to keep a tag, whether entry from that parent equals to entry from minimal parent. For performance reasons I'm keeping that tag in entry's mode field in unused bit  see S_IFXMIN_NEQ. Not doing so, we'd need to alloca another [nparent] array, which hurts performance. 3) For emitted paths, memory could be reused, if we know the path was processed via callback and will not be needed later. We use efficient handmade reallocstyle path_appendnew(), that saves us from ~11.5% of potential additional slowdown. 4) goto(s) are used in several places, as the code executes a little bit faster with lowered register pressure. Also  we should now check for FIND_COPIES_HARDER not only when two entries names are the same, and their hashes are equal, but also for a case, when a path was removed from some of all parents having it. The reason is, if we don't, that path won't be emitted at all (see "a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants all paths  with diff or without  to be emitted, to be later analyzed for being copies sources. The new check is only necessary for nparent >1, as for nparent=1 case xmin_eqtotal always =1 =nparent, and a path is always added to diff as removal. ~~~~~~~~ Timings for # without c, i.e. testing only nparent=1 case `git log raw noabbrev norenames` before and after the patch are as follows: navy.git linux.git v3.10..v3.11 before 0.611s 1.889s after 0.619s 1.907s slowdown 1.3% 0.9% This timings show we did no harm to usual diff(tree1,tree2) generation. From the table we can see that we actually did ~1% slowdown, but I think I've "earned" that 1% in the previous patch ("treediff: reuse base str(buf) memory on subtree recursion", HEAD~~) so for nparent=1 case, net timings stays approximately the same. The output also stayed the same. (*) If we revert 1)4) to more usual techniques, for nparent=1 case, we'll get ~22.5% of additional slowdown, which I've tried to avoid, as "do no harm for nparent=1 case" rule. For linux.git, combined diff will run an order of magnitude faster and appropriate timings will be provided in the next commit, as we'll be taking advantage of the new diff treewalker for combineddiff generation there. P.S. and combined diff is not some exotic/forplayonly stuff  for example for a program I write to represent Git archives as readonly filesystem, there is initial scan with `git log reverse raw noabbrev norenames c` to extract log of what was created/changed when, as a result building a map {} sha1 > in which commit (and date) a content was added that `c` means also show combined diff for merges, and without them, if a merge is nontrivial (merges changes from two parents with both having separate changes to a file), or an evil one, the map will not be full, i.e. some valid sha1 would be absent from it. That case was my initial motivation for combined diffs speedup. Signedoffby: Kirill Smelkov <kirr@mns.spb.ru> Signedoffby: Junio C Hamano <gitster@pobox.com>
72441af7