Support different git diff algorithms
Yesterday this paper was on hackernews:
- https://link.springer.com/article/10.1007%2Fs10664-019-09772-z
- https://news.ycombinator.com/item?id=22689301
It has a thorough analysis of the different diff algorithms available to git and compares the default Meyers
to the Histogram
algorithm. The paper recommends to use the Histogram
algorithm:
For patch application, we found that the Histogram is more suitable than Myers for providing the changes of code, from our manual analysis.
Currently we seem to use the default algorithm, I wonder
- if it makes sense to:explore the possibility to change diff algorithms per config
- what kind of impact a different algorithm would have e.g. on the computation of diffs. I assume that the more advanced diffing is more computationally expensive
However: if diffs become more humanly readable as claimed by the paper, and it might lead to less configs when diffing, then these could be benefits which would be useful to customers as well. More readable diffs could mean that Engineers introduce less bugs.
Gitaly Analysis
The ~"gitaly::git" team has reviewed the problem statement and confirmed that this is feasible both here and here.
This issue will be re-assigned to the groupcode review as the change will need to be utilized by that team for the MR review view, which will need assessment from them.