1. 01 Aug, 2018 1 commit
    • Stefan Beller's avatar
      xdiff: reduce indent heuristic overhead · 301ef854
      Stefan Beller authored
      Skip searching for better indentation heuristics if we'd slide a hunk more
      than its size. This is the easiest fix proposed in the analysis[1] in
      response to a patch that mercurial took for xdiff to limit searching
      by a constant. Using a performance test as:
      
           #!python
           open('a', 'w').write(" \n" * 1000000)
           open('b', 'w').write(" \n" * 1000001)
      
      This patch reduces the execution of "git diff --no-index a b" from
      0.70s to 0.31s. However limiting the sliding to the size of the diff hunk,
      which was proposed as a solution (that I found easiest to implement for
      now) is not optimal for cases like
      
           open('a', 'w').write(" \n" * 1000000)
           open('b', 'w').write(" \n" * 2000000)
      
      as then we'd still slide 1000000 times.
      
      In addition to limiting the sliding to size of the hunk, also limit by a
      constant. Choose 100 lines as the constant as that fits more than a screen,
      which really means that the diff sliding is probably not providing a lot
      of benefit anyway.
      
      [1] https://public-inbox.org/git/72ac1ac2-f567-f241-41d6-d0f83072e0b3@alum.mit.edu/Reported-by: default avatarJun Wu <quark@fb.com>
      Analysis-by: default avatarMichael Haggerty <mhagger@alum.mit.edu>
      Signed-off-by: Stefan Beller's avatarStefan Beller <sbeller@google.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      301ef854
  2. 17 Jul, 2018 1 commit
  3. 09 Nov, 2017 1 commit
  4. 23 Dec, 2016 1 commit
    • Junio C Hamano's avatar
      diff: retire "compaction" heuristics · 3cde4e02
      Junio C Hamano authored
      When a patch inserts a block of lines, whose last lines are the
      same as the existing lines that appear before the inserted block,
      "git diff" can choose any place between these existing lines as the
      boundary between the pre-context and the added lines (adjusting the
      end of the inserted block as appropriate) to come up with variants
      of the same patch, and some variants are easier to read than others.
      
      We have been trying to improve the choice of this boundary, and Git
      2.11 shipped with an experimental "compaction-heuristic".  Since
      then another attempt to improve the logic further resulted in a new
      "indent-heuristic" logic.  It is agreed that the latter gives better
      result overall, and the former outlived its usefulness.
      
      Retire "compaction", and keep "indent" as an experimental feature.
      The latter hopefully will be turned on by default in a future
      release, but that should be done as a separate step.
      Suggested-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      3cde4e02
  5. 27 Sep, 2016 1 commit
    • Jeff King's avatar
      xdiff: rename "struct group" to "struct xdlgroup" · 134e40d7
      Jeff King authored
      Commit e8adf23d (xdl_change_compact(): introduce the concept
      of a change group, 2016-08-22) added a "struct group" type
      to xdiff/xdiffi.c. But the POSIX system header "grp.h"
      already defines "struct group" (it is part of the getgrnam
      interface). This happens to work because the new type is
      local to xdiffi.c, and the xdiff code includes a relatively
      small set of system headers. But it will break compilation
      if xdiff ever switches to using git-compat-util.h.  It can
      also probably cause confusion with tools that look at the
      whole code base, like coccinelle or ctags.
      
      Let's resolve by giving the xdiff variant a scoped name,
      which is closer to other xdiff types anyway (e.g.,
      xdlfile_t, though note that xdiff is fond if typedefs when
      Git usually is not).
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      134e40d7
  6. 19 Sep, 2016 1 commit
    • Michael Haggerty's avatar
      diff: improve positioning of add/delete blocks in diffs · 433860f3
      Michael Haggerty authored
      Some groups of added/deleted lines in diffs can be slid up or down,
      because lines at the edges of the group are not unique. Picking good
      shifts for such groups is not a matter of correctness but definitely has
      a big effect on aesthetics. For example, consider the following two
      diffs. The first is what standard Git emits:
      
          --- a/9c572b21:git-send-email.perl
          +++ b/6dcfa306:git-send-email.perl
          @@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
           }
      
           if (!$smtp_server) {
          +       $smtp_server = $repo->config('sendemail.smtpserver');
          +}
          +if (!$smtp_server) {
                  foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                          if (-x $_) {
                                  $smtp_server = $_;
      
      The following diff is equivalent, but is obviously preferable from an
      aesthetic point of view:
      
          --- a/9c572b21:git-send-email.perl
          +++ b/6dcfa306:git-send-email.perl
          @@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
                  $initial_reply_to =~ s/(^\s+|\s+$)//g;
           }
      
          +if (!$smtp_server) {
          +       $smtp_server = $repo->config('sendemail.smtpserver');
          +}
           if (!$smtp_server) {
                  foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                          if (-x $_) {
      
      This patch teaches Git to pick better positions for such "diff sliders"
      using heuristics that take the positions of nearby blank lines and the
      indentation of nearby lines into account.
      
      The existing Git code basically always shifts such "sliders" as far down
      in the file as possible. The only exception is when the slider can be
      aligned with a group of changed lines in the other file, in which case
      Git favors depicting the change as one add+delete block rather than one
      add and a slightly offset delete block. This naive algorithm often
      yields ugly diffs.
      
      Commit d634d61e improved the situation somewhat by preferring to
      position add/delete groups to make their last line a blank line, when
      that is possible. This heuristic does more good than harm, but (1) it
      can only help if there are blank lines in the right places, and (2)
      always picks the last blank line, even if there are others that might be
      better. The end result is that it makes perhaps 1/3 as many errors as
      the default Git algorithm, but that still leaves a lot of ugly diffs.
      
      This commit implements a new and much better heuristic for picking
      optimal "slider" positions using the following approach: First observe
      that each hypothetical positioning of a diff slider introduces two
      splits: one between the context lines preceding the group and the first
      added/deleted line, and the other between the last added/deleted line
      and the first line of context following it. It tries to find the
      positioning that creates the least bad splits.
      
      Splits are evaluated based only on the presence and locations of nearby
      blank lines, and the indentation of lines near the split. Basically, it
      prefers to introduce splits adjacent to blank lines, between lines that
      are indented less, and between lines with the same level of indentation.
      In more detail:
      
      1. It measures the following characteristics of a proposed splitting
         position in a `struct split_measurement`:
      
         * the number of blank lines above the proposed split
         * whether the line directly after the split is blank
         * the number of blank lines following that line
         * the indentation of the nearest non-blank line above the split
         * the indentation of the line directly below the split
         * the indentation of the nearest non-blank line after that line
      
      2. It combines the measured attributes using a bunch of
         empirically-optimized weighting factors to derive a `struct
         split_score` that measures the "badness" of splitting the text at
         that position.
      
      3. It combines the `split_score` for the top and the bottom of the
         slider at each of its possible positions, and selects the position
         that has the best `split_score`.
      
      I determined the initial set of weighting factors by collecting a corpus
      of Git histories from 29 open-source software projects in various
      programming languages. I generated many diffs from this corpus, and
      determined the best positioning "by eye" for about 6600 diff sliders. I
      used about half of the repositories in the corpus (corresponding to
      about 2/3 of the sliders) as a training set, and optimized the weights
      against this corpus using a crude automated search of the parameter
      space to get the best agreement with the manually-determined values.
      Then I tested the resulting heuristic against the full corpus. The
      results are summarized in the following table, in column `indent-1`:
      
      | repository            | count |      Git 2.9.0 |     compaction | compaction-fixed |       indent-1 |       indent-2 |
      | --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
      | afnetworking          |   109 |    89  (81.7%) |    37  (33.9%) |      37  (33.9%) |     2   (1.8%) |     2   (1.8%) |
      | alamofire             |    30 |    18  (60.0%) |    14  (46.7%) |      15  (50.0%) |     0   (0.0%) |     0   (0.0%) |
      | angular               |   184 |   127  (69.0%) |    39  (21.2%) |      23  (12.5%) |     5   (2.7%) |     5   (2.7%) |
      | animate               |   313 |     2   (0.6%) |     2   (0.6%) |       2   (0.6%) |     2   (0.6%) |     2   (0.6%) |
      | ant                   |   380 |   356  (93.7%) |   152  (40.0%) |     148  (38.9%) |    15   (3.9%) |    15   (3.9%) | *
      | bugzilla              |   306 |   263  (85.9%) |   109  (35.6%) |      99  (32.4%) |    14   (4.6%) |    15   (4.9%) | *
      | corefx                |   126 |    91  (72.2%) |    22  (17.5%) |      21  (16.7%) |     6   (4.8%) |     6   (4.8%) |
      | couchdb               |    78 |    44  (56.4%) |    26  (33.3%) |      28  (35.9%) |     6   (7.7%) |     6   (7.7%) | *
      | cpython               |   937 |   158  (16.9%) |    50   (5.3%) |      49   (5.2%) |     5   (0.5%) |     5   (0.5%) | *
      | discourse             |   160 |    95  (59.4%) |    42  (26.2%) |      36  (22.5%) |    18  (11.2%) |    13   (8.1%) |
      | docker                |   307 |   194  (63.2%) |   198  (64.5%) |     253  (82.4%) |     8   (2.6%) |     8   (2.6%) | *
      | electron              |   163 |   132  (81.0%) |    38  (23.3%) |      39  (23.9%) |     6   (3.7%) |     6   (3.7%) |
      | git                   |   536 |   470  (87.7%) |    73  (13.6%) |      78  (14.6%) |    16   (3.0%) |    16   (3.0%) | *
      | gitflow               |   127 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
      | ionic                 |   133 |    89  (66.9%) |    29  (21.8%) |      38  (28.6%) |     1   (0.8%) |     1   (0.8%) |
      | ipython               |   482 |   362  (75.1%) |   167  (34.6%) |     169  (35.1%) |    11   (2.3%) |    11   (2.3%) | *
      | junit                 |   161 |   147  (91.3%) |    67  (41.6%) |      66  (41.0%) |     1   (0.6%) |     1   (0.6%) | *
      | lighttable            |    15 |     5  (33.3%) |     0   (0.0%) |       2  (13.3%) |     0   (0.0%) |     0   (0.0%) |
      | magit                 |    88 |    75  (85.2%) |    11  (12.5%) |       9  (10.2%) |     1   (1.1%) |     0   (0.0%) |
      | neural-style          |    28 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
      | nodejs                |   781 |   649  (83.1%) |   118  (15.1%) |     111  (14.2%) |     4   (0.5%) |     5   (0.6%) | *
      | phpmyadmin            |   491 |   481  (98.0%) |    75  (15.3%) |      48   (9.8%) |     2   (0.4%) |     2   (0.4%) | *
      | react-native          |   168 |   130  (77.4%) |    79  (47.0%) |      81  (48.2%) |     0   (0.0%) |     0   (0.0%) |
      | rust                  |   171 |   128  (74.9%) |    30  (17.5%) |      27  (15.8%) |    16   (9.4%) |    14   (8.2%) |
      | spark                 |   186 |   149  (80.1%) |    52  (28.0%) |      52  (28.0%) |     2   (1.1%) |     2   (1.1%) |
      | tensorflow            |   115 |    66  (57.4%) |    48  (41.7%) |      48  (41.7%) |     5   (4.3%) |     5   (4.3%) |
      | test-more             |    19 |    15  (78.9%) |     2  (10.5%) |       2  (10.5%) |     1   (5.3%) |     1   (5.3%) | *
      | test-unit             |    51 |    34  (66.7%) |    14  (27.5%) |       8  (15.7%) |     2   (3.9%) |     2   (3.9%) | *
      | xmonad                |    23 |    22  (95.7%) |     2   (8.7%) |       2   (8.7%) |     1   (4.3%) |     1   (4.3%) | *
      | --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
      | totals                |  6668 |  4391  (65.9%) |  1496  (22.4%) |    1491  (22.4%) |   150   (2.2%) |   144   (2.2%) |
      | totals (training set) |  4552 |  3195  (70.2%) |  1053  (23.1%) |    1061  (23.3%) |    86   (1.9%) |    88   (1.9%) |
      | totals (test set)     |  2116 |  1196  (56.5%) |   443  (20.9%) |     430  (20.3%) |    64   (3.0%) |    56   (2.6%) |
      
      In this table, the numbers are the count and percentage of human-rated
      sliders that the corresponding algorithm got *wrong*. The columns are
      
      * "repository" - the name of the repository used. I used the diffs
        between successive non-merge commits on the HEAD branch of the
        corresponding repository.
      
      * "count" - the number of sliders that were human-rated. I chose most,
        but not all, sliders to rate from those among which the various
        algorithms gave different answers.
      
      * "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0.
      
      * "compaction" - the heuristic used by `git diff --compaction-heuristic`
        in Git 2.9.0.
      
      * "compaction-fixed" - the heuristic used by `git diff
        --compaction-heuristic` after the fixes from earlier in this patch
        series. Note that the results are not dramatically different than
        those for "compaction". Both produce non-ideal diffs only about 1/3 as
        often as the default `git diff`.
      
      * "indent-1" - the new `--indent-heuristic` algorithm, using the first
        set of weighting factors, determined as described above.
      
      * "indent-2" - the new `--indent-heuristic` algorithm, using the final
        set of weighting factors, determined as described below.
      
      * `*` - indicates that repo was part of training set used to determine
        the first set of weighting factors.
      
      The fact that the heuristic performed nearly as well on the test set as
      on the training set in column "indent-1" is a good indication that the
      heuristic was not over-trained. Given that fact, I ran a second round of
      optimization, using the entire corpus as the training set. The resulting
      set of weights gave the results in column "indent-2". These are the
      weights included in this patch.
      
      The final result gives consistently and significantly better results
      across the whole corpus than either `git diff` or `git diff
      --compaction-heuristic`. It makes only about 1/30 as many errors as the
      former and about 1/10 as many errors as the latter. (And a good fraction
      of the remaining errors are for diffs that involve weirdly-formatted
      code, sometimes apparently machine-generated.)
      
      The tools that were used to do this optimization and analysis, along
      with the human-generated data values, are recorded in a separate project
      [1].
      
      This patch adds a new command-line option `--indent-heuristic`, and a
      new configuration setting `diff.indentHeuristic`, that activate this
      heuristic. This interface is only meant for testing purposes, and should
      be finalized before including this change in any release.
      
      [1] https://github.com/mhagger/diff-slider-toolsSigned-off-by: default avatarMichael Haggerty <mhagger@alum.mit.edu>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      433860f3
  7. 23 Aug, 2016 5 commits
    • Michael Haggerty's avatar
      xdl_change_compact(): introduce the concept of a change group · e8adf23d
      Michael Haggerty authored
      The idea of xdl_change_compact() is fairly simple:
      
      * Proceed through groups of changed lines in the file to be compacted,
        keeping track of the corresponding location in the "other" file.
      
      * If possible, slide the group up and down to try to give the most
        aesthetically pleasing diff. Whenever it is slid, the current location
        in the other file needs to be adjusted.
      
      But these simple concepts are obfuscated by a lot of index handling that
      is written in terse, subtle, and varied patterns. I found it very hard
      to convince myself that the function was correct.
      
      So introduce a "struct group" that represents a group of changed lines
      in a file. Add some functions that perform elementary operations on
      groups:
      
      * Initialize a group to the first group in a file
      * Move to the next or previous group in a file
      * Slide a group up or down
      
      Even though the resulting code is longer, I think it is easier to
      understand and review. Its performance is not changed
      appreciably (though it would be if `group_next()` and `group_previous()`
      were not inlined).
      
      ...and in fact, the rewriting helped me discover another bug in the
      --compaction-heuristic code: The update of blank_lines was never done
      for the highest possible position of the group. This means that it could
      fail to slide the group to its highest possible position, even if that
      position had a blank line as its last line. So for example, it yielded
      the following diff:
      
          $ git diff --no-index --compaction-heuristic a.txt b.txt
          diff --git a/a.txt b/b.txt
          index e53969f..0d60c5fe 100644
          --- a/a.txt
          +++ b/b.txt
          @@ -1,3 +1,7 @@
           1
           A
          +
          +B
          +
          +A
           2
      
      when in fact the following diff is better (according to the rules of
      --compaction-heuristic):
      
          $ git diff --no-index --compaction-heuristic a.txt b.txt
          diff --git a/a.txt b/b.txt
          index e53969f..0d60c5fe 100644
          --- a/a.txt
          +++ b/b.txt
          @@ -1,3 +1,7 @@
           1
          +A
          +
          +B
          +
           A
           2
      
      The new code gives the bottom answer.
      Signed-off-by: default avatarMichael Haggerty <mhagger@alum.mit.edu>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      e8adf23d
    • Michael Haggerty's avatar
      recs_match(): take two xrecord_t pointers as arguments · 152598cb
      Michael Haggerty authored
      There is no reason for it to take an array and two indexes as argument,
      as it only accesses two elements of the array.
      Signed-off-by: default avatarMichael Haggerty <mhagger@alum.mit.edu>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      152598cb
    • Michael Haggerty's avatar
      is_blank_line(): take a single xrecord_t as argument · c06c0b63
      Michael Haggerty authored
      There is no reason for it to take an array and index as argument, as it
      only accesses a single element of the array.
      Signed-off-by: default avatarMichael Haggerty <mhagger@alum.mit.edu>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      c06c0b63
    • Michael Haggerty's avatar
      xdl_change_compact(): only use heuristic if group can't be matched · cb0eded8
      Michael Haggerty authored
      If the changed group of lines can be matched to a group in the other
      file, then that positioning should take precedence over the compaction
      heuristic.
      
      The old code tried the heuristic unconditionally, which cost redundant
      effort and also was broken if the matching code had already shifted the
      group higher than the blank line.
      Signed-off-by: default avatarMichael Haggerty <mhagger@alum.mit.edu>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      cb0eded8
    • Michael Haggerty's avatar
      xdl_change_compact(): fix compaction heuristic to adjust ixo · a8fd78cc
      Michael Haggerty authored
      The code branch used for the compaction heuristic forgot to keep ixo in
      sync while the group was shifted. This is certainly wrong, as it causes
      the two counters to get out of sync.
      
      I *think* that this bug could also have caused the function to read past
      the end of the rchgo array, though I haven't done the work to prove it
      for sure. Here is my reasoning:
      
      If ixo is not decremented correctly during one iteration of the outer
      while loop, then it will loose sync with the ix counter. In particular,
      ixo will be too large.
      
      Suppose that the next iterations of the outer while loop (i.e.,
      processing the next block of add/delete lines) don't have any sliders.
      Then the ixo counter would be incremented by the number of non-changed
      lines in xdf, which is the same as the number of non-changed lines in
      xdfo that *should have* followed the group that experienced the
      malfunction. But since ixo was too large at the end of that iteration,
      it will be incremented past the end of the xdfo->rchg array, and will
      try to read that memory illegally.
      Signed-off-by: default avatarMichael Haggerty <mhagger@alum.mit.edu>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      a8fd78cc
  8. 19 Apr, 2016 1 commit
    • Stefan Beller's avatar
      xdiff: implement empty line chunk heuristic · d634d61e
      Stefan Beller authored
      In order to produce the smallest possible diff and combine several diff
      hunks together, we implement a heuristic from GNU Diff which moves diff
      hunks forward as far as possible when we find common context above and
      below a diff hunk. This sometimes produces less readable diffs when
      writing C, Shell, or other programming languages, ie:
      
      ...
       /*
      + *
      + *
      + */
      +
      +/*
      ...
      
      instead of the more readable equivalent of
      
      ...
      +/*
      + *
      + *
      + */
      +
       /*
      ...
      
      Implement the following heuristic to (optionally) produce the desired
      output.
      
        If there are diff chunks which can be shifted around, shift each hunk
        such that the last common empty line is below the chunk with the rest
        of the context above.
      
      This heuristic appears to resolve the above example and several other
      common issues without producing significantly weird results. However, as
      with any heuristic it is not really known whether this will always be
      more optimal. Thus, it can be disabled via diff.compactionHeuristic.
      Signed-off-by: Stefan Beller's avatarStefan Beller <sbeller@google.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: Stefan Beller's avatarStefan Beller <sbeller@google.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      d634d61e
  9. 18 Apr, 2016 1 commit
  10. 19 Jun, 2013 1 commit
    • Antoine Pelisse's avatar
      diff: add --ignore-blank-lines option · 36617af7
      Antoine Pelisse authored
      The goal of the patch is to introduce the GNU diff
      -B/--ignore-blank-lines as closely as possible. The short option is not
      available because it's already used for "break-rewrites".
      
      When this option is used, git-diff will not create hunks that simply
      add or remove empty lines, but will still show empty lines
      addition/suppression if they are close enough to "valuable" changes.
      
      There are two differences between this option and GNU diff -B option:
      - GNU diff doesn't have "--inter-hunk-context", so this must be handled
      - The following sequence looks like a bug (context is displayed twice):
      
          $ seq 5 >file1
          $ cat <<EOF >file2
          change
          1
          2
      
          3
          4
          5
          change
          EOF
          $ diff -u -B file1 file2
          --- file1	2013-06-08 22:13:04.471517834 +0200
          +++ file2	2013-06-08 22:13:23.275517855 +0200
          @@ -1,5 +1,7 @@
          +change
           1
           2
          +
           3
           4
           5
          @@ -3,3 +5,4 @@
           3
           4
           5
          +change
      
      So here is a more thorough description of the option:
      - real changes are interesting
      - blank lines that are close enough (less than context size) to
      interesting changes are considered interesting (recursive definition)
      - "context" lines are used around each hunk of interesting changes
      - If two hunks are separated by less than "inter-hunk-context", they
      will be merged into one.
      
      The implementation does the "interesting changes selection" in a single
      pass.
      Signed-off-by: Antoine Pelisse's avatarAntoine Pelisse <apelisse@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      36617af7
  11. 12 Apr, 2013 1 commit
  12. 09 May, 2012 2 commits
  13. 19 Feb, 2012 1 commit
  14. 12 Jul, 2011 1 commit
    • Ray's avatar
      teach --histogram to diff · 8c912eea
      Ray authored
      Port JGit's HistogramDiff algorithm over to C. Rough numbers (TODO) show
      that it is faster than its --patience cousin, as well as the default
      Meyers algorithm.
      
      The implementation has been reworked to use structs and pointers,
      instead of bitmasks, thus doing away with JGit's 2^28 line limit.
      
      We also use xdiff's default hash table implementation (xdl_hash_bits()
      with XDL_HASHLONG()) for convenience.
      Signed-off-by: Ray's avatarTay Ray Chuan <rctay89@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      8c912eea
  15. 23 Jul, 2009 1 commit
  16. 23 Apr, 2009 1 commit
  17. 16 Mar, 2009 1 commit
  18. 07 Jan, 2009 1 commit
    • Johannes Schindelin's avatar
      Implement the patience diff algorithm · 92b7de93
      Johannes Schindelin authored
      The patience diff algorithm produces slightly more intuitive output
      than the classic Myers algorithm, as it does not try to minimize the
      number of +/- lines first, but tries to preserve the lines that are
      unique.
      
      To this end, it first determines lines that are unique in both files,
      then the maximal sequence which preserves the order (relative to both
      files) is extracted.
      
      Starting from this initial set of common lines, the rest of the lines
      is handled recursively, with Myers' algorithm as a fallback when
      the patience algorithm fails (due to no common unique lines).
      
      This patch includes memory leak fixes by Pierre Habouzit.
      Signed-off-by: Johannes Schindelin's avatarJohannes Schindelin <johannes.schindelin@gmx.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      92b7de93
  19. 25 Oct, 2008 1 commit
    • Brian Downing's avatar
      Allow alternate "low-level" emit function from xdl_diff · ef2e62fe
      Brian Downing authored
      For some users (e.g. git blame), getting textual patch output is just
      extra work, as they can get all the information they need from the low-
      level diff structures.  Allow for an alternate low-level emit function
      to be defined to allow bypassing the textual patch generation; set
      xemitconf_t's emit_func member to enable this.
      
      The (void (*)()) type is pretty ugly, but the alternative would be to
      include most of the private xdiff headers in xdiff.h to get the types
      required for the "proper" function prototype.  Also, a (void *) won't
      work, as ANSI C doesn't allow a function pointer to be cast to an
      object pointer.
      Signed-off-by: Brian Downing's avatarBrian Downing <bdowning@lavos.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      ef2e62fe
  20. 16 Nov, 2007 1 commit
  21. 07 Jun, 2007 1 commit
    • Junio C Hamano's avatar
      War on whitespace · a6080a0a
      Junio C Hamano authored
      This uses "git-apply --whitespace=strip" to fix whitespace errors that have
      crept in to our source files over time.  There are a few files that need
      to have trailing whitespaces (most notably, test vectors).  The results
      still passes the test, and build result in Documentation/ area is unchanged.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      a6080a0a
  22. 03 Dec, 2006 1 commit
    • Johannes Schindelin's avatar
      xdiff: add xdl_merge() · 857b933e
      Johannes Schindelin authored
      This new function implements the functionality of RCS merge, but
      in-memory. It returns < 0 on error, otherwise the number of conflicts.
      
      Finding the conflicting lines can be a very expensive task. You can
      control the eagerness of this algorithm:
      
      - a level value of 0 means that all overlapping changes are treated
        as conflicts,
      - a value of 1 means that if the overlapping changes are identical,
        it is not treated as a conflict.
      - If you set level to 2, overlapping changes will be analyzed, so that
        almost identical changes will not result in huge conflicts. Rather,
        only the conflicting lines will be shown inside conflict markers.
      
      With each increasing level, the algorithm gets slower, but more accurate.
      Note that the code for level 2 depends on the simple definition of
      mmfile_t specific to git, and therefore it will be harder to port that
      to LibXDiff.
      Signed-off-by: Johannes Schindelin's avatarJohannes Schindelin <Johannes.Schindelin@gmx.de>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      857b933e
  23. 10 Jul, 2006 1 commit
  24. 24 Jun, 2006 1 commit
  25. 13 Apr, 2006 1 commit
  26. 09 Apr, 2006 1 commit
    • Marco Roeland's avatar
      xdiff/xdiffi.c: fix warnings about possibly uninitialized variables · 0ed49a3e
      Marco Roeland authored
      Compiling this module gave the following warnings (some double dutch!):
      
      xdiff/xdiffi.c: In functie 'xdl_recs_cmp':
      xdiff/xdiffi.c:298: let op: 'spl.i1' may be used uninitialized in this function
      xdiff/xdiffi.c:298: let op: 'spl.i2' may be used uninitialized in this function
      xdiff/xdiffi.c:219: let op: 'fbest1' may be used uninitialized in this function
      xdiff/xdiffi.c:219: let op: 'bbest1' may be used uninitialized in this function
      
      A superficial tracking of their usage, without deeper knowledge about the
      algorithm, indeed confirms that there are code paths on which these
      variables will be used uninitialized. In practice these code paths might never
      be reached, but then these fixes will not change the algorithm. If these
      code paths are ever reached we now at least have a predictable outcome. And
      should the very small performance impact of these initializations be
      noticeable, then they should at least be replaced by comments why certain
      code paths will never be reached.
      
      Some extra initializations in this patch now fix the warnings.
      0ed49a3e
  27. 04 Apr, 2006 1 commit
  28. 26 Mar, 2006 1 commit
    • Linus Torvalds's avatar
      Use a *real* built-in diff generator · 3443546f
      Linus Torvalds authored
      This uses a simplified libxdiff setup to generate unified diffs _without_
      doing  fork/execve of GNU "diff".
      
      This has several huge advantages, for example:
      
      Before:
      
      	[torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null
      
      	real    0m24.818s
      	user    0m13.332s
      	sys     0m8.664s
      
      After:
      
      	[torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null
      
      	real    0m4.563s
      	user    0m2.944s
      	sys     0m1.580s
      
      and the fact that this should be a lot more portable (ie we can ignore all
      the issues with doing fork/execve under Windows).
      
      Perhaps even more importantly, this allows us to do diffs without actually
      ever writing out the git file contents to a temporary file (and without
      any of the shell quoting issues on filenames etc etc).
      
      NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the
      current "diff-core" code actually will always write the temp-files,
      because it used to be something that you simply had to do. So this current
      one actually writes a temp-file like before, and then reads it into memory
      again just to do the diff. Stupid.
      
      But if this basic infrastructure is accepted, we can start switching over
      diff-core to not write temp-files, which should speed things up even
      further, especially when doing big tree-to-tree diffs.
      
      Now, in the interest of full disclosure, I should also point out a few
      downsides:
      
       - the libxdiff algorithm is different, and I bet GNU diff has gotten a
         lot more testing. And the thing is, generating a diff is not an exact
         science - you can get two different diffs (and you will), and they can
         both be perfectly valid. So it's not possible to "validate" the
         libxdiff output by just comparing it against GNU diff.
      
       - GNU diff does some nice eye-candy, like trying to figure out what the
         last function was, and adding that information to the "@@ .." line.
         libxdiff doesn't do that.
      
       - The libxdiff thing has some known deficiencies. In particular, it gets
         the "\No newline at end of file" case wrong. So this is currently for
         the experimental branch only. I hope Davide will help fix it.
      
      That said, I think the huge performance advantage, and the fact that it
      integrates better is definitely worth it. But it should go into a
      development branch at least due to the missing newline issue.
      
      Technical note: this is based on libxdiff-0.17, but I did some surgery to
      get rid of the extraneous fat - stuff that git doesn't need, and seriously
      cutting down on mmfile_t, which had much more capabilities than the diff
      algorithm either needed or used. In this version, "mmfile_t" is just a
      trivial <pointer,length> tuple.
      
      That said, I tried to keep the differences to simple removals, so that you
      can do a diff between this and the libxdiff origin, and you'll basically
      see just things getting deleted. Even the mmfile_t simplifications are
      left in a state where the diffs should be readable.
      
      Apologies to Davide, whom I'd love to get feedback on this all from (I
      wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old
      complex format had a helper function for that, but I did my surgery with
      the goal in mind that eventually we _should_ just do
      
      	mmfile_t mf;
      
      	buf = read_sha1_file(sha1, type, &size);
      	mf->ptr = buf;
      	mf->size = size;
      	.. use "mf" directly ..
      
      which was really a nightmare with the old "helpful" mmfile_t, and really
      is that easy with the new cut-down interfaces).
      
      [ Btw, as any hawk-eye can see from the diff, this was actually generated
        with itself, so it is "self-hosting". That's about all the testing it
        has gotten, along with the above kernel diff, which eye-balls correctly,
        but shows the newline issue when you double-check it with "git-apply" ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      3443546f