1. 26 Dec, 2018 1 commit
    • Thomas Braun's avatar
      log -G: ignore binary files · e0e7cb80
      Thomas Braun authored
      The -G<regex> option of log looks for the differences whose patch text
      contains added/removed lines that match regex.
      
      Currently -G looks also into patches of binary files (which
      according to [1]) is binary as well.
      
      This has a couple of issues:
      
      - It makes the pickaxe search slow. In a proprietary repository of the
        author with only ~5500 commits and a total .git size of ~300MB
        searching takes ~13 seconds
      
          $time git log -Gwave > /dev/null
      
          real    0m13,241s
          user    0m12,596s
          sys     0m0,644s
      
        whereas when we ignore binary files with this patch it takes ~4s
      
          $time ~/devel/git/git log -Gwave > /dev/null
      
          real    0m3,713s
          user    0m3,608s
          sys     0m0,105s
      
        which is a speedup of more than fourfold.
      
      - The internally used algorithm for generating patch text is based on
        xdiff and its states in [1]
      
        > The output format of the binary patch file is proprietary
        > (and binary) and it is basically a collection of copy and insert
        > commands [..]
      
        which means that the current format could change once the internal
        algorithm is changed as the format is not standardized. In addition
        the git binary patch format used for preparing patches for git apply
        is *different* from the xdiff format as can be seen by comparing
      
        git log -p -a
      
          commit 6e95bf4bafccf14650d02ab57f3affe669be10cf
          Author: A U Thor <author@example.com>
          Date:   Thu Apr 7 15:14:13 2005 -0700
      
              modify binary file
      
          diff --git a/data.bin b/data.bin
          index f414c84..edfeb6f 100644
          --- a/data.bin
          +++ b/data.bin
          @@ -1,2 +1,4 @@
           a
           a^@a
          +a
          +a^@a
      
        with git log --binary
      
          commit 6e95bf4bafccf14650d02ab57f3affe669be10cf
          Author: A U Thor <author@example.com>
          Date:   Thu Apr 7 15:14:13 2005 -0700
      
              modify binary file
      
          diff --git a/data.bin b/data.bin
          index f414c84bd3aa25fa07836bb1fb73db784635e24b..edfeb6f501[..]
          GIT binary patch
          literal 12
          QcmYe~N@Pgn0zx1O01)N^ZvX%Q
      
          literal 6
          NcmYe~N@Pgn0ssWg0XP5v
      
        which seems unexpected.
      
      To resolve these issues this patch makes -G<regex> ignore binary files
      by default. Textconv filters are supported and also -a/--text for
      getting the old and broken behaviour back.
      
      The -S<block of text> option of log looks for differences that changes
      the number of occurrences of the specified block of text (i.e.
      addition/deletion) in a file. As we want to keep the current behaviour,
      add a test to ensure it stays that way.
      
      [1]: http://www.xmailserver.org/xdiff.htmlSigned-off-by: default avatarThomas Braun <thomas.braun@virtuell-zuhause.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      e0e7cb80
  2. 12 Nov, 2018 1 commit
  3. 05 Nov, 2018 1 commit
  4. 02 Nov, 2018 1 commit
    • Jeff King's avatar
      xdiff-interface: provide a separate consume callback for hunks · 9346d6d1
      Jeff King authored
      The previous commit taught xdiff to optionally provide the hunk header
      data to a specialized callback. But most users of xdiff actually use our
      more convenient xdi_diff_outf() helper, which ensures that our callbacks
      are always fed whole lines.
      
      Let's plumb the special hunk-callback through this interface, too. It
      will follow the same rule as xdiff when the hunk callback is NULL (i.e.,
      continue to pass a stringified hunk header to the line callback). Since
      we add NULL to each caller, there should be no behavior change yet.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      9346d6d1
  5. 21 Sep, 2018 2 commits
  6. 21 May, 2018 1 commit
    • Martin Ågren's avatar
      regex: do not call `regfree()` if compilation fails · 17154b15
      Martin Ågren authored
      It is apparently undefined behavior to call `regfree()` on a regex where
      `regcomp()` failed. The language in [1] is a bit muddy, at least to me,
      but the clearest hint is this (`preg` is the `regex_t *`):
      
          Upon successful completion, the regcomp() function shall return 0.
          Otherwise, it shall return an integer value indicating an error as
          described in <regex.h>, and the content of preg is undefined.
      
      Funnily enough, there is also the `regerror()` function which should be
      given a pointer to such a "failed" `regex_t` -- the content of which
      would supposedly be undefined -- and which may investigate it to come up
      with a detailed error message.
      
      In any case, the example in that document shows how `regfree()` is not
      called after `regcomp()` fails.
      
      We have quite a few users of this API and most get this right. These
      three users do not.
      
      Several implementations can handle this just fine [2] and these code paths
      supposedly have not wreaked havoc or we'd have heard about it. (These
      are all in code paths where git got bad input and is just about to die
      anyway.) But let's just avoid the issue altogether.
      
      [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html
      
      [2] https://www.redhat.com/archives/libvir-list/2013-September/msg00262.htmlResearched-by: Eric Sunshine's avatarEric Sunshine <sunshine@sunshineco.com>
      Signed-off-byi Martin Ågren <martin.agren@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      17154b15
  7. 04 Jan, 2018 3 commits
  8. 01 Nov, 2017 2 commits
    • Brandon Williams's avatar
      diff: make struct diff_flags members lowercase · 0d1e0e78
      Brandon Williams authored
      Now that the flags stored in struct diff_flags are being accessed
      directly and not through macros, change all struct members from being
      uppercase to lowercase.
      This conversion is done using the following semantic patch:
      
      	@@
      	expression E;
      	@@
      	- E.RECURSIVE
      	+ E.recursive
      
      	@@
      	expression E;
      	@@
      	- E.TREE_IN_RECURSIVE
      	+ E.tree_in_recursive
      
      	@@
      	expression E;
      	@@
      	- E.BINARY
      	+ E.binary
      
      	@@
      	expression E;
      	@@
      	- E.TEXT
      	+ E.text
      
      	@@
      	expression E;
      	@@
      	- E.FULL_INDEX
      	+ E.full_index
      
      	@@
      	expression E;
      	@@
      	- E.SILENT_ON_REMOVE
      	+ E.silent_on_remove
      
      	@@
      	expression E;
      	@@
      	- E.FIND_COPIES_HARDER
      	+ E.find_copies_harder
      
      	@@
      	expression E;
      	@@
      	- E.FOLLOW_RENAMES
      	+ E.follow_renames
      
      	@@
      	expression E;
      	@@
      	- E.RENAME_EMPTY
      	+ E.rename_empty
      
      	@@
      	expression E;
      	@@
      	- E.HAS_CHANGES
      	+ E.has_changes
      
      	@@
      	expression E;
      	@@
      	- E.QUICK
      	+ E.quick
      
      	@@
      	expression E;
      	@@
      	- E.NO_INDEX
      	+ E.no_index
      
      	@@
      	expression E;
      	@@
      	- E.ALLOW_EXTERNAL
      	+ E.allow_external
      
      	@@
      	expression E;
      	@@
      	- E.EXIT_WITH_STATUS
      	+ E.exit_with_status
      
      	@@
      	expression E;
      	@@
      	- E.REVERSE_DIFF
      	+ E.reverse_diff
      
      	@@
      	expression E;
      	@@
      	- E.CHECK_FAILED
      	+ E.check_failed
      
      	@@
      	expression E;
      	@@
      	- E.RELATIVE_NAME
      	+ E.relative_name
      
      	@@
      	expression E;
      	@@
      	- E.IGNORE_SUBMODULES
      	+ E.ignore_submodules
      
      	@@
      	expression E;
      	@@
      	- E.DIRSTAT_CUMULATIVE
      	+ E.dirstat_cumulative
      
      	@@
      	expression E;
      	@@
      	- E.DIRSTAT_BY_FILE
      	+ E.dirstat_by_file
      
      	@@
      	expression E;
      	@@
      	- E.ALLOW_TEXTCONV
      	+ E.allow_textconv
      
      	@@
      	expression E;
      	@@
      	- E.TEXTCONV_SET_VIA_CMDLINE
      	+ E.textconv_set_via_cmdline
      
      	@@
      	expression E;
      	@@
      	- E.DIFF_FROM_CONTENTS
      	+ E.diff_from_contents
      
      	@@
      	expression E;
      	@@
      	- E.DIRTY_SUBMODULES
      	+ E.dirty_submodules
      
      	@@
      	expression E;
      	@@
      	- E.IGNORE_UNTRACKED_IN_SUBMODULES
      	+ E.ignore_untracked_in_submodules
      
      	@@
      	expression E;
      	@@
      	- E.IGNORE_DIRTY_SUBMODULES
      	+ E.ignore_dirty_submodules
      
      	@@
      	expression E;
      	@@
      	- E.OVERRIDE_SUBMODULE_CONFIG
      	+ E.override_submodule_config
      
      	@@
      	expression E;
      	@@
      	- E.DIRSTAT_BY_LINE
      	+ E.dirstat_by_line
      
      	@@
      	expression E;
      	@@
      	- E.FUNCCONTEXT
      	+ E.funccontext
      
      	@@
      	expression E;
      	@@
      	- E.PICKAXE_IGNORE_CASE
      	+ E.pickaxe_ignore_case
      
      	@@
      	expression E;
      	@@
      	- E.DEFAULT_FOLLOW_RENAMES
      	+ E.default_follow_renames
      Signed-off-by: default avatarBrandon Williams <bmwill@google.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      0d1e0e78
    • Brandon Williams's avatar
      diff: remove DIFF_OPT_TST macro · 3b69daed
      Brandon Williams authored
      Remove the `DIFF_OPT_TST` macro and instead access the flags directly.
      This conversion is done using the following semantic patch:
      
      	@@
      	expression E;
      	identifier fld;
      	@@
      	- DIFF_OPT_TST(&E, fld)
      	+ E.flags.fld
      
      	@@
      	type T;
      	T *ptr;
      	identifier fld;
      	@@
      	- DIFF_OPT_TST(ptr, fld)
      	+ ptr->flags.fld
      Signed-off-by: default avatarBrandon Williams <bmwill@google.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      3b69daed
  9. 18 Mar, 2017 1 commit
    • Gábor Szeder's avatar
      pickaxe: fix segfault with '-S<...> --pickaxe-regex' · f53c5de2
      Gábor Szeder authored
      'git {log,diff,...} -S<...> --pickaxe-regex' can segfault as a result
      of out-of-bounds memory reads.
      
      diffcore-pickaxe.c:contains() looks for all matches of the given regex
      in a buffer in a loop, advancing the buffer pointer to the end of the
      last match in each iteration.  When we switched to REG_STARTEND in
      b7d36ffc (regex: use regexec_buf(), 2016-09-21), we started passing
      the size of that buffer to the regexp engine, too.  Unfortunately,
      this buffer size is never updated on subsequent iterations, and as the
      buffer pointer advances on each iteration, this "bufptr+bufsize"
      points past the end of the buffer.  This results in segmentation
      fault, if that memory can't be accessed.  In case of 'git log' it can
      also result in erroneously listed commits, if the memory past the end
      of buffer is accessible and happens to contain data matching the
      regex.
      
      Reduce the buffer size on each iteration as the buffer pointer is
      advanced, thus maintaining the correct end of buffer location.
      Furthermore, make sure that the buffer pointer is not dereferenced in
      the control flow statements when we already reached the end of the
      buffer.
      
      The new test is flaky, I've never seen it fail on my Linux box even
      without the fix, but this is expected according to db5dfa33 (regex:
      -G<pattern> feeds a non NUL-terminated string to regexec() and fails,
      2016-09-21).  However, it did fail on Travis CI with the first (and
      incomplete) version of the fix, and based on that commit message I
      would expect the new test without the fix to fail most of the time on
      Windows.
      Signed-off-by: Gábor Szeder's avatarSZEDER Gábor <szeder.dev@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      f53c5de2
  10. 21 Sep, 2016 1 commit
    • Johannes Schindelin's avatar
      regex: use regexec_buf() · b7d36ffc
      Johannes Schindelin authored
      The new regexec_buf() function operates on buffers with an explicitly
      specified length, rather than NUL-terminated strings.
      
      We need to use this function whenever the buffer we want to pass to
      regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated).
      
      Note: the original motivation for this patch was to fix a bug where
      `git diff -G <regex>` would crash. This patch converts more callers,
      though, some of which allocated to construct NUL-terminated strings,
      or worse, modified buffers to temporarily insert NULs while calling
      regexec(3).  By converting them to use regexec_buf(), the code has
      become much cleaner.
      Signed-off-by: Johannes Schindelin's avatarJohannes Schindelin <johannes.schindelin@gmx.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      b7d36ffc
  11. 01 Jul, 2016 2 commits
  12. 28 Sep, 2015 1 commit
    • Jeff King's avatar
      react to errors in xdi_diff · 3efb9880
      Jeff King authored
      When we call into xdiff to perform a diff, we generally lose
      the return code completely. Typically by ignoring the return
      of our xdi_diff wrapper, but sometimes we even propagate
      that return value up and then ignore it later.  This can
      lead to us silently producing incorrect diffs (e.g., "git
      log" might produce no output at all, not even a diff header,
      for a content-level diff).
      
      In practice this does not happen very often, because the
      typical reason for xdiff to report failure is that it
      malloc() failed (it uses straight malloc, and not our
      xmalloc wrapper).  But it could also happen when xdiff
      triggers one our callbacks, which returns an error (e.g.,
      outf() in builtin/rerere.c tries to report a write failure
      in this way). And the next patch also plans to add more
      failure modes.
      
      Let's notice an error return from xdiff and react
      appropriately. In most of the diff.c code, we can simply
      die(), which matches the surrounding code (e.g., that is
      what we do if we fail to load a file for diffing in the
      first place). This is not that elegant, but we are probably
      better off dying to let the user know there was a problem,
      rather than simply generating bogus output.
      
      We could also just die() directly in xdi_diff, but the
      callers typically have a bit more context, and can provide a
      better message (and if we do later decide to pass errors up,
      we're one step closer to doing so).
      
      There is one interesting case, which is in diff_grep(). Here
      if we cannot generate the diff, there is nothing to match,
      and we silently return "no hits". This is actually what the
      existing code does already, but we make it a little more
      explicit.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      3efb9880
  13. 24 Mar, 2014 5 commits
  14. 07 Jul, 2013 1 commit
  15. 03 Jun, 2013 1 commit
  16. 05 Apr, 2013 6 commits
    • Jeff King's avatar
      diffcore-pickaxe: unify code for log -S/-G · 61690bf4
      Jeff King authored
      The logic flow of has_changes() used for "log -S" and diff_grep()
      used for "log -G" are essentially the same.  See if we have both
      sides that could be different in any interesting way, slurp the
      contents in core, possibly after applying textconv, inspect the
      contents, clean-up and report the result.  The only difference
      between the two is how "inspect" step works.
      
      Unify this codeflow in a helper, pickaxe_match(), which takes a
      callback function that implements the specific "inspect" step.
      
      After removing the common scaffolding code from the existing
      has_changes() and diff_grep(), they each becomes such a callback
      function suitable for passing to pickaxe_match().
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      61690bf4
    • Junio C Hamano's avatar
      diffcore-pickaxe: fix leaks in "log -S<block>" and "log -G<pattern>" · 88ff684d
      Junio C Hamano authored
      The diff_grep() and has_changes() functions had early return
      codepaths for unmerged filepairs, which simply returned 0.  When we
      taught textconv filter to them, one was ignored and continued to
      return early without freeing the result filtered by textconv, and
      the other had a failed attempt to fix, which allowed the planned
      return value 0 to be overwritten by a bogus call to contains().
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      88ff684d
    • Junio C Hamano's avatar
      diffcore-pickaxe: port optimization from has_changes() to diff_grep() · ebb72262
      Junio C Hamano authored
      These two functions are called in the same codeflow to implement
      "log -S<block>" and "log -G<pattern>", respectively, but the latter
      lacked two obvious optimizations the former implemented, namely:
      
       - When a pickaxe limit is not given at all, they should return
         without wasting any cycle;
      
       - When both sides of the filepair are the same, and the same
         textconv conversion apply to them, return early, as there will be
         no interesting differences between the two anyway.
      
      Also release the filespec data once the processing is done (this is
      not about leaking memory--it is about releasing data we finished
      looking at as early as possible).
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      ebb72262
    • Simon Ruderich's avatar
      diffcore-pickaxe: respect --no-textconv · a8f61094
      Simon Ruderich authored
      git log -S doesn't respect --no-textconv:
      
          $ echo '*.txt diff=wrong' > .gitattributes
          $ git -c diff.wrong.textconv='xxx' log --no-textconv -Sfoo
          error: cannot run xxx: No such file or directory
          fatal: unable to read files to diff
      Reported-by: Matthieu Moy's avatarMatthieu Moy <Matthieu.Moy@grenoble-inp.fr>
      Signed-off-by: Simon Ruderich's avatarSimon Ruderich <simon@ruderich.org>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      a8f61094
    • Jeff King's avatar
      diffcore-pickaxe: remove fill_one() · 7cdb9b42
      Jeff King authored
      fill_one is _almost_ identical to just calling fill_textconv; the
      exception is that for the !DIFF_FILE_VALID case, fill_textconv gives us
      an empty buffer rather than a NULL one. Since we currently use the NULL
      pointer as a signal that the file is not present on one side of the
      diff, we must now switch to using DIFF_FILE_VALID to make the same
      check.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: Simon Ruderich's avatarSimon Ruderich <simon@ruderich.org>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      7cdb9b42
    • Simon Ruderich's avatar
      diffcore-pickaxe: remove unnecessary call to get_textconv() · bc615898
      Simon Ruderich authored
      The fill_one() function is responsible for finding and filling the
      textconv filter as necessary, and is called by diff_grep() function
      that implements "git log -G<pattern>".
      
      The has_changes() function that implements "git log -S<block>" calls
      get_textconv() for two sides being compared, before it checks to see
      if it was asked to perform the pickaxe limiting.  Move the code
      around to avoid this wastage.
      
      After has_changes() calls get_textconv() to obtain textconv for both
      sides, fill_one() is called to use them.
      
      By adding get_textconv() to diff_grep() and relieving fill_one() of
      responsibility to find the textconv filter, we can avoid calling
      get_textconv() twice in has_changes().
      
      With this change it's also no longer necessary for fill_one() to
      modify the textconv argument, therefore pass a pointer instead of a
      pointer to a pointer.
      Signed-off-by: Simon Ruderich's avatarSimon Ruderich <simon@ruderich.org>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      bc615898
  17. 28 Oct, 2012 3 commits
    • Jeff King's avatar
      pickaxe: use textconv for -S counting · ef90ab66
      Jeff King authored
      We currently just look at raw blob data when using "-S" to
      pickaxe. This is mostly historical, as pickaxe predates the
      textconv feature. If the user has bothered to define a
      textconv filter, it is more likely that their search string will be
      on the textconv output, as that is what they will see in the
      diff (and we do not even provide a mechanism for them to
      search for binary needles that contain NUL characters).
      
      This patch teaches "-S" to use textconv, just as we
      already do for "-G".
      Signed-off-by: default avatarJeff King <peff@peff.net>
      ef90ab66
    • Jeff King's avatar
      pickaxe: hoist empty needle check · 8fa4b09f
      Jeff King authored
      If we are given an empty pickaxe needle like "git log -S ''",
      it is impossible for us to find anything (because no matter
      what the content, the count will always be 0). We currently
      check this at the lowest level of contains(). Let's hoist
      the logic much earlier to has_changes(), so that it is
      simpler to return our answer before loading any blob data.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      8fa4b09f
    • Jeff King's avatar
      diff_grep: use textconv buffers for add/deleted files · b1c2f57d
      Jeff King authored
      If you use "-G" to grep a diff, we will apply a configured
      textconv filter to the data before generating the diff.
      However, if the diff is an addition or deletion, we do not
      bother running the diff at all, and just look for the token
      in the added (or removed) content. This works because we
      know that the diff must contain every line of content.
      
      However, while we used the textconv-derived buffers in the
      regular diff, we accidentally passed the original unmodified
      buffers to regexec when checking the added or removed
      content. This could lead to an incorrect answer.
      
      Worse, in some cases we might have a textconv buffer but no
      original buffer (e.g., if we pulled the textconv data from
      cache, or if we reused a working tree file when generating
      it). In that case, we could actually feed NULL to regexec
      and segfault.
      Reported-by: Peter Oberndorfer's avatarPeter Oberndorfer <kumbayo84@arcor.de>
      Signed-off-by: default avatarJeff King <peff@peff.net>
      b1c2f57d
  18. 29 Feb, 2012 1 commit
    • Junio C Hamano's avatar
      pickaxe: allow -i to search in patch case-insensitively · accccde4
      Junio C Hamano authored
      "git log -S<string>" is a useful way to find the last commit in the
      codebase that touched the <string>. As it was designed to be used by a
      porcelain script to dig the history starting from a block of text that
      appear in the starting commit, it never had to look for anything but an
      exact match.
      
      When used by an end user who wants to look for the last commit that
      removed a string (e.g. name of a variable) that he vaguely remembers,
      however, it is useful to support case insensitive match.
      
      When given the "--regexp-ignore-case" (or "-i") option, which originally
      was designed to affect case sensitivity of the search done in the commit
      log part, e.g. "log --grep", the matches made with -S/-G pickaxe search is
      done case insensitively now.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      accccde4
  19. 07 Oct, 2011 6 commits