1. 18 Aug, 2014 2 commits
  2. 24 Feb, 2014 3 commits
    • Duy Nguyen's avatar
      diff: do not quit early on stat-dirty files · f34b205f
      Duy Nguyen authored
      When QUICK is set (i.e. with --quiet) we try to do as little work as
      possible, stopping after seeing the first change. stat-dirty is
      considered a "change" but it may turn out not, if no actual content is
      changed. The actual content test is performed too late in the process
      and the shortcut may be taken prematurely, leading to incorrect return
      code.
      
      Assume we do "git diff --quiet". If we have a stat-dirty file "a" and
      a really dirty file "b". We break the loop in run_diff_files() and
      stop after "a" because we have got a "change". Later in
      diffcore_skip_stat_unmatch() we find out "a" is actually not
      changed. But there's nothing else in the diff queue, we incorrectly
      declare "no change", ignoring the fact that "b" is changed.
      
      This also happens to "git diff --quiet HEAD" when it hits
      diff_can_quit_early() in oneway_diff().
      
      This patch does the content test earlier in order to keep going if "a"
      is unchanged. The test result is cached so that when
      diffcore_skip_stat_unmatch() is done in the end, we spend no cycles on
      re-testing "a".
      Reported-by: Toshihiro Iwamoto's avatarIWAMOTO Toshihiro <iwamoto@valinux.co.jp>
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      f34b205f
    • Kirill Smelkov's avatar
      diffcore-order: export generic ordering interface · 1df4320f
      Kirill Smelkov authored
      diffcore_order() interface only accepts a queue of `struct
      diff_filepair`.
      
      In the next patches, we'll want to order `struct combine_diff_path`
      by path, so let's first rework diffcore-order to also provide
      generic low-level interface for ordering arbitrary objects, provided
      they have path accessors.
      
      The new interface is:
      
          - `struct obj_order`    for describing objects to ordering routine, and
          - order_objects()       for actually doing the ordering work.
      Signed-off-by: default avatarKirill Smelkov <kirr@mns.spb.ru>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      1df4320f
    • Richard Lowe's avatar
      diffcore.h: be explicit about the signedness of is_binary · 7d0a9a75
      Richard Lowe authored
      Bitfields need to specify their signedness explicitly or the compiler is
      free to default as it sees fit.  With compilers that default 'unsigned'
      (SUNWspro 12 seems to do this) the tri-state nature of is_binary
      vanishes and all files are treated as binary.
      Signed-off-by: default avatarRichard Lowe <richlowe@richlowe.net>
      Acked-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      7d0a9a75
  3. 17 Jan, 2014 5 commits
    • Jeff King's avatar
      diff_filespec: use only 2 bits for is_binary flag · cbfe47b6
      Jeff King authored
      The is_binary flag needs only three values: -1, 0, and 1.
      However, we use a whole 32-bit int for it on most systems
      (both 32- and 64- bit).
      
      Instead, we can mark it to use only 2 bits. On 32-bit
      systems, this lets it end up as part of the bitfield above
      (saving 4 bytes). On 64-bit systems, we don't see any change
      (because the savings end up as padding), but it does leave
      room for another "free" 32-bit value to be added later.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      cbfe47b6
    • Jeff King's avatar
      diff_filespec: reorder is_binary field · b38f70a8
      Jeff King authored
      The middle of the diff_filespec struct contains a mixture of
      ints, shorts, and bit-fields, followed by a pointer. On an
      x86-64 system with an LP64 or LLP64 data model (i.e., most
      of them), the integers and flags end up being padded out by
      41 bits to put the pointer at an 8-byte boundary.
      
      After the pointer, we have the "int is_binary" field, which
      is only 32 bits. We end up wasting another 32 bits to pad
      the struct size up to a multiple of 64 bits.
      
      We can move the is_binary field before the pointer, which
      lets the compiler store it where we used to have padding.
      This shrinks the top padding to only 9 bits (from the
      bit-fields), and eliminates the bottom padding entirely,
      dropping the struct size from 88 to 80 bytes.
      
      On a 32-bit system, there is no benefit, but nor should
      there be any harm (we only need 4-byte alignment there, so
      we were already using only 9 bits of padding).
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      b38f70a8
    • Jeff King's avatar
      diff_filespec: drop xfrm_flags field · 428d52a5
      Jeff King authored
      The only mention of this field in the code is by some
      debugging code which prints it out (and it will always be
      zero, since we never touch it otherwise). It was obsoleted
      very early on by 25d5ea41 ([PATCH] Redo rename/copy detection
      logic., 2005-05-24).
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      428d52a5
    • Jeff King's avatar
      diff_filespec: drop funcname_pattern_ident field · 5b711b20
      Jeff King authored
      This struct field was obsoleted by be58e70d (diff: unify
      external diff and funcname parsing code, 2008-10-05), but we
      forgot to remove it.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      5b711b20
    • Jeff King's avatar
      diff_filespec: reorder dirty_submodule macro definitions · b837f5d6
      Jeff King authored
      diff_filespec has a 2-bit "dirty_submodule" field and
      defines two flags as macros. Originally these were right
      next to each other, but a new field was accidentally added
      in between in commit 4682d852. This patch puts the field and
      its flags back together.
      
      Using an enum like:
      
        enum {
      	  DIRTY_SUBMODULE_UNTRACKED = 1,
      	  DIRTY_SUBMODULE_MODIFIED = 2
        } dirty_submodule;
      
      would be more obvious, but it bloats the structure. Limiting
      the enum size like:
      
        } dirty_submodule : 2;
      
      might work, but it is not portable.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      b837f5d6
  4. 29 Jul, 2012 1 commit
    • Jeff King's avatar
      diff: do not use null sha1 as a sentinel value · e5450100
      Jeff King authored
      The diff code represents paths using the diff_filespec
      struct. This struct has a sha1 to represent the sha1 of the
      content at that path, as well as a sha1_valid member which
      indicates whether its sha1 field is actually useful. If
      sha1_valid is not true, then the filespec represents a
      working tree file (e.g., for the no-index case, or for when
      the index is not up-to-date).
      
      The diff_filespec is only used internally, though. At the
      interfaces to the diff subsystem, callers feed the sha1
      directly, and we create a diff_filespec from it. It's at
      that point that we look at the sha1 and decide whether it is
      valid or not; callers may pass the null sha1 as a sentinel
      value to indicate that it is not.
      
      We should not typically see the null sha1 coming from any
      other source (e.g., in the index itself, or from a tree).
      However, a corrupt tree might have a null sha1, which would
      cause "diff --patch" to accidentally diff the working tree
      version of a file instead of treating it as a blob.
      
      This patch extends the edges of the diff interface to accept
      a "sha1_valid" flag whenever we accept a sha1, and to use
      that flag when creating a filespec. In some cases, this
      means passing the flag through several layers, making the
      code change larger than would be desirable.
      
      One alternative would be to simply die() upon seeing
      corrupted trees with null sha1s. However, this fix more
      directly addresses the problem (while bogus sha1s in a tree
      are probably a bad thing, it is really the sentinel
      confusion sending us down the wrong code path that is what
      makes it devastating). And it means that git is more capable
      of examining and debugging these corrupted trees. For
      example, you can still "diff --raw" such a tree to find out
      when the bogus entry was introduced; you just cannot do a
      "--patch" diff (just as you could not with any other
      corrupted tree, as we do not have any content to diff).
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      e5450100
  5. 28 Jun, 2012 1 commit
    • Junio C Hamano's avatar
      diff-index.c: "git diff" has no need to read blob from the standard input · 4682d852
      Junio C Hamano authored
      Only "diff --no-index -" does.  Bolting the logic into the low-level
      function diff_populate_filespec() was a layering violation from day
      one.  Move populate_from_stdin() function out of the generic diff.c
      to its only user, diff-index.c.
      
      Also make sure "-" from the command line stays a special token "read
      from the standard input", even if we later decide to sanitize the
      result from prefix_filename() function in a few obvious ways,
      e.g. removing unnecessary "./" prefix, duplicated slashes "//" in
      the middle, etc.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      4682d852
  6. 21 Aug, 2011 1 commit
    • Junio C Hamano's avatar
      combine-diff: support format_callback · 25e5e2bf
      Junio C Hamano authored
      This teaches combine-diff machinery to feed a combined merge to a callback
      function when DIFF_FORMAT_CALLBACK is specified.
      
      So far, format callback functions are not used for anything but 2-way
      diffs. A callback is given a diff_queue_struct, which is an array of
      diff_filepair. As its name suggests, a diff_filepair is a _pair_ of
      diff_filespec that represents a single preimage and a single postimage.
      
      Since "diff -c" is to compare N parents with a single merge result and
      filter out any paths whose result match one (or more) of the parent(s),
      its output has to be able to represent N preimages and 1 postimage. For
      this reason, a callback function that inspects a diff_filepair that
      results from this new infrastructure can and is expected to view the
      preimage side (i.e. pair->one) as an array of diff_filespec. Each element
      in the array, except for the last one, is marked with "has_more_entries"
      bit, so that the same callback function can be used for 2-way diffs and
      combined diffs.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      25e5e2bf
  7. 31 Aug, 2010 1 commit
  8. 13 Aug, 2010 1 commit
    • Junio C Hamano's avatar
      diff --follow: do call diffcore_std() as necessary · 44c48a90
      Junio C Hamano authored
      Usually, diff frontends populate the output queue with filepairs without
      any rename information and call diffcore_std() to sort the renames out.
      When --follow is in effect, however, diff-tree family of frontend has a
      hack that looks like this:
      
          diff-tree frontend
          -> diff_tree_sha1()
             . populate diff_queued_diff
             . if --follow is in effect and there is only one change that
               creates the target path, then
             -> try_to_follow_renames()
      	  -> diff_tree_sha1() with no pathspec but with -C
      	  -> diffcore_std() to find renames
      	  . if rename is found, tweak diff_queued_diff and put a
      	    single filepair that records the found rename there
          -> diffcore_std()
             . tweak elements on diff_queued_diff by
             - rename detection
             - path ordering
             - pickaxe filtering
      
      We need to skip parts of the second call to diffcore_std() that is related
      to rename detection, and do so only when try_to_follow_renames() did find
      a rename.  Earlier 1da6175d (Make diffcore_std only can run once before a
      diff_flush, 2010-05-06) tried to deal with this issue incorrectly; it
      unconditionally disabled any second call to diffcore_std().
      
      This hopefully fixes the breakage.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      44c48a90
  9. 12 Aug, 2010 1 commit
  10. 09 Aug, 2010 1 commit
  11. 07 May, 2010 2 commits
  12. 05 Mar, 2010 1 commit
    • Jens Lehmann's avatar
      git diff --submodule: Show detailed dirty status of submodules · c7e1a736
      Jens Lehmann authored
      When encountering a dirty submodule while doing "git diff --submodule"
      print an extra line for new untracked content and another for modified
      but already tracked content. And if the HEAD of the submodule is equal
      to the ref diffed against in the superproject, drop the output which
      would just show the same SHA1s and no commit message headlines.
      
      To achieve that, the dirty_submodule bitfield is expanded to two bits.
      The output of "git status" inside the submodule is parsed to set the
      according bits.
      Signed-off-by: default avatarJens Lehmann <Jens.Lehmann@web.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      c7e1a736
  13. 19 Jan, 2010 1 commit
    • Jens Lehmann's avatar
      Performance optimization for detection of modified submodules · e3d42c47
      Jens Lehmann authored
      In the worst case is_submodule_modified() got called three times for
      each submodule. The information we got from scanning the whole
      submodule tree the first time can be reused instead.
      
      New parameters have been added to diff_change() and diff_addremove(),
      the information is stored in a new member of struct diff_filespec. Its
      value is then reused instead of calling is_submodule_modified() again.
      
      When no explicit "-dirty" is needed in the output the call to
      is_submodule_modified() is not necessary when the submodules HEAD
      already disagrees with the ref of the superproject, as this alone
      marks it as modified. To achieve that, get_stat_data() got an extra
      argument.
      Signed-off-by: default avatarJens Lehmann <Jens.Lehmann@web.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      e3d42c47
  14. 02 Nov, 2008 1 commit
  15. 18 Oct, 2008 1 commit
    • Jeff King's avatar
      diff: introduce diff.<driver>.binary · 122aa6f9
      Jeff King authored
      The "diff" gitattribute is somewhat overloaded right now. It
      can say one of three things:
      
        1. this file is definitely binary, or definitely not
           (i.e., diff or !diff)
        2. this file should use an external diff engine (i.e.,
           diff=foo, diff.foo.command = custom-script)
        3. this file should use particular funcname patterns
           (i.e., diff=foo, diff.foo.(x?)funcname = some-regex)
      
      Most of the time, there is no conflict between these uses,
      since using one implies that the other is irrelevant (e.g.,
      an external diff engine will decide for itself whether the
      file is binary).
      
      However, there is at least one conflicting situation: there
      is no way to say "use the regular rules to determine whether
      this file is binary, but if we do diff it textually, use
      this funcname pattern." That is, currently setting diff=foo
      indicates that the file is definitely text.
      
      This patch introduces a "binary" config option for a diff
      driver, so that one can explicitly set diff.foo.binary. We
      default this value to "don't know". That is, setting a diff
      attribute to "foo" and using "diff.foo.funcname" will have
      no effect on the binaryness of a file. To get the current
      behavior, one can set diff.foo.binary to true.
      
      This patch also has one additional advantage: it cleans up
      the interface to the userdiff code a bit. Before, calling
      code had to know more about whether attributes were false,
      true, or unset to determine binaryness. Now that binaryness
      is a property of a driver, we can represent these situations
      just by passing back a driver struct.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarShawn O. Pearce <spearce@spearce.org>
      122aa6f9
  16. 20 Sep, 2008 1 commit
  17. 27 Oct, 2007 2 commits
    • Linus Torvalds's avatar
      copy vs rename detection: avoid unnecessary O(n*m) loops · 64479711
      Linus Torvalds authored
      The core rename detection had some rather stupid code to check if a
      pathname was used by a later modification or rename, which basically
      walked the whole pathname space for all renames for each rename, in
      order to tell whether it was a pure rename (no remaining users) or
      should be considered a copy (other users of the source file remaining).
      
      That's really silly, since we can just keep a count of users around, and
      replace all those complex and expensive loops with just testing that
      simple counter (but this all depends on the previous commit that shared
      the diff_filespec data structure by using a separate reference count).
      
      Note that the reference count is not the same as the rename count: they
      behave otherwise rather similarly, but the reference count is tied to
      the allocation (and decremented at de-allocation, so that when it turns
      zero we can get rid of the memory), while the rename count is tied to
      the renames and is decremented when we find a rename (so that when it
      turns zero we know that it was a rename, not a copy).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      64479711
    • Linus Torvalds's avatar
      Ref-count the filespecs used by diffcore · 9fb88419
      Linus Torvalds authored
      Rather than copy the filespecs when introducing new versions of them
      (for rename or copy detection), use a refcount and increment the count
      when reusing the diff_filespec.
      
      This avoids unnecessary allocations, but the real reason behind this is
      a future enhancement: we will want to track shared data across the
      copy/rename detection.  In order to efficiently notice when a filespec
      is used by a rename, the rename machinery wants to keep track of a
      rename usage count which is shared across all different users of the
      filespec.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      9fb88419
  18. 03 Oct, 2007 2 commits
    • Junio C Hamano's avatar
    • Jeff King's avatar
      diffcore-rename: cache file deltas · eede7b7d
      Jeff King authored
      We find rename candidates by computing a fingerprint hash of
      each file, and then comparing those fingerprints. There are
      inherently O(n^2) comparisons, so it pays in CPU time to
      hoist the (rather expensive) computation of the fingerprint
      out of that loop (or to cache it once we have computed it once).
      
      Previously, we didn't keep the filespec information around
      because then we had the potential to consume a great deal of
      memory. However, instead of keeping all of the filespec
      data, we can instead just keep the fingerprint.
      
      This patch implements and uses diff_free_filespec_data_large
      to accomplish that goal. We also have to change
      estimate_similarity not to needlessly repopulate the
      filespec data when we already have the hash.
      
      Practical tests showed 4.5x speedup for a 10% memory usage
      increase.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      eede7b7d
  19. 26 Sep, 2007 1 commit
    • Jeff King's avatar
      diffcore-rename: cache file deltas · a5a3878b
      Jeff King authored
      We find rename candidates by computing a fingerprint hash of
      each file, and then comparing those fingerprints. There are
      inherently O(n^2) comparisons, so it pays in CPU time to
      hoist the (rather expensive) computation of the fingerprint
      out of that loop (or to cache it once we have computed it once).
      
      Previously, we didn't keep the filespec information around
      because then we had the potential to consume a great deal of
      memory. However, instead of keeping all of the filespec
      data, we can instead just keep the fingerprint.
      
      This patch implements and uses diff_free_filespec_data_large
      to accomplish that goal. We also have to change
      estimate_similarity not to needlessly repopulate the
      filespec data when we already have the hash.
      
      Practical tests showed 4.5x speedup for a 10% memory usage
      increase.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      a5a3878b
  20. 07 Jul, 2007 1 commit
    • Junio C Hamano's avatar
      Fix configuration syntax to specify customized hunk header patterns. · e0e324a4
      Junio C Hamano authored
      This updates the hunk header customization syntax.  The special
      case 'funcname' attribute is gone.
      
      You assign the name of the type of contents to path's "diff"
      attribute as a string value in .gitattributes like this:
      
      	*.java diff=java
      	*.perl diff=perl
      	*.doc diff=doc
      
      If you supply "diff.<name>.funcname" variable via the
      configuration mechanism (e.g. in $HOME/.gitconfig), the value is
      used as the regexp set to find the line to use for the hunk
      header (the variable is called "funcname" because such a line
      typically is the one that has the name of the function in
      programming language source text).
      
      If there is no such configuration, built-in default is used, if
      any.  Currently there are two default patterns: default and java.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      e0e324a4
  21. 06 Jul, 2007 2 commits
    • Junio C Hamano's avatar
      Per-path attribute based hunk header selection. · f258475a
      Junio C Hamano authored
      This makes"diff -p" hunk headers customizable via gitattributes mechanism.
      It is based on Johannes's earlier patch that allowed to define a single
      regexp to be used for everything.
      
      The mechanism to arrive at the regexp that is used to define hunk header
      is the same as other use of gitattributes.  You assign an attribute, funcname
      (because "diff -p" typically uses the name of the function the patch is about
      as the hunk header), a simple string value.  This can be one of the names of
      built-in pattern (currently, "java" is defined) or a custom pattern name, to
      be looked up from the configuration file.
      
        (in .gitattributes)
        *.java   funcname=java
        *.perl   funcname=perl
      
        (in .git/config)
        [funcname]
          java = ... # ugly and complicated regexp to override the built-in one.
          perl = ... # another ugly and complicated regexp to define a new one.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      f258475a
    • Junio C Hamano's avatar
      Introduce diff_filespec_is_binary() · 29a3eefd
      Junio C Hamano authored
      This replaces an explicit initialization of filespec->is_binary
      field used for rename/break followed by direct access to that
      field with a wrapper function that lazily iniaitlizes and
      accesses the field.  We would add more attribute accesses for
      the use of diff routines, and it would be better to make this
      abstraction earlier.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      29a3eefd
  22. 01 Jul, 2007 2 commits
  23. 29 Apr, 2007 1 commit
  24. 07 Jan, 2007 1 commit
    • Junio C Hamano's avatar
      diff-index --cached --raw: show tree entry on the LHS for unmerged entries. · e9c84099
      Junio C Hamano authored
      This updates the way diffcore represents an unmerged pair
      somewhat.  It used to be that entries with mode=0 on both sides
      were used to represent an unmerged pair, but now it has an
      explicit flag.  This is to allow diff-index --cached to report
      the entry from the tree when the path is unmerged in the index.
      
      This is used in updating "git reset <tree> -- <path>" to restore
      absense of the path in the index from the tree.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      e9c84099
  25. 03 Aug, 2006 1 commit
    • Junio C Hamano's avatar
      diff.c: do not use pathname comparison to tell renames · ef677686
      Junio C Hamano authored
      The final output from diff used to compare pathnames between
      preimage and postimage to tell if the filepair is a rename/copy.
      By explicitly marking the filepair created by diffcore_rename(),
      the output routine, resolve_rename_copy(), does not have to do
      so anymore.  This helps feeding a filepair that has different
      pathnames in one and two elements to the diff machinery (most
      notably, comparing two blobs).
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      ef677686
  26. 12 Mar, 2006 1 commit
  27. 04 Mar, 2006 1 commit
  28. 01 Mar, 2006 1 commit