Add support for alternate diff algorithms to match git cli behavior
Proposal
Add support for selecting between the diff algorithms offered by the --diff-algorithm argument to git diff. Both the GitLab web interface and GraphQL API could benefit from this support, for improved consistency with the git CLI.
Background
Diff workflows in the git CLI currently support four different algorithms for identifying Longest Common Subsequences (LCS), which in turn can produce different selections for added/removed hunks in the resulting diff. The default algorithm is myers, in reference to the Myers (1986) algorithm. Another options is histogram, a more modern algorithm which can produce cleaner-looking diffs with more coherent hunk selection - particularly for code, which often contains large numbers of identical lines (curly braces, etc.) that trip up a Myers diff.
This paper goes into much more detail about the differences between Myers and histogram diff, along with multiple relevant examples, and argues that histogram is the superior choice: Nugroho, Y.S., Hata, H. & Matsumoto, K. How different are different diff algorithms in Git?. Empir Software Eng 25, 790–823 (2020)
As for the current behavior of GitLab... in fact, it would appear that GitLab produces a diff which is based on a third, older algorithm; Hunt-McIlroy (1976), through the use of the diff-lcs Ruby gem. This produces results that are largely similar to Myers diffs, but likely aren't identical in all cases - so the current output may not always be consistent with even a default-configured git CLI diff operation.
As a result:
- the GitLab web interface for change history produces messier diffs than advocates of
histogramdiff are used to seeing. - users of GraphQL APIs for obtaining diffs are given results that can be inconsistent with local operations (different change hunks, different diffstats, etc.)
There does not appear to be any way to select between the different algorithms in either the web interface or the GraphQL API.