1. 17 Dec, 2014 1 commit
    • Jeff King's avatar
      read-cache: optionally disallow HFS+ .git variants · a42643aa
      Jeff King authored
      The point of disallowing ".git" in the index is that we
      would never want to accidentally overwrite files in the
      repository directory. But this means we need to respect the
      filesystem's idea of when two paths are equal. The prior
      commit added a helper to make such a comparison for HFS+;
      let's use it in verify_path.
      We make this check optional for two reasons:
        1. It restricts the set of allowable filenames, which is
           unnecessary for people who are not on HFS+. In practice
           this probably doesn't matter, though, as the restricted
           names are rather obscure and almost certainly would
           never come up in practice.
        2. It has a minor performance penalty for every path we
           insert into the index.
      This patch ties the check to the core.protectHFS config
      option. Though this is expected to be most useful on OS X,
      we allow it to be set everywhere, as HFS+ may be mounted on
      other platforms. The variable does default to on for OS X,
      Signed-off-by: default avatarJeff King <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
  2. 28 Oct, 2013 1 commit
  3. 24 Oct, 2013 1 commit
    • Junio C Hamano's avatar
      checkout_entry(): clarify the use of topath[] parameter · af2a651d
      Junio C Hamano authored
      The said function has this signature:
      	extern int checkout_entry(struct cache_entry *ce,
      				  const struct checkout *state,
      				  char *topath);
      At first glance, it might appear that the caller of checkout_entry()
      can specify to which path the contents are written out by the last
      parameter, and it is tempting to add "const" in front of its type.
      In reality, however, topath[] is to point at a buffer to store the
      temporary path generated by the callchain originating from this
      function, and the temporary path is always short, much shorter than
      the buffer prepared by its only caller in builtin/checkout-index.c.
      Document the code a bit to clarify so that future callers know how
      to use the function better.
      Noticed-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
  4. 14 Oct, 2013 1 commit
    • Jiang Xin's avatar
      Use simpler relative_path when set_git_dir · 41894ae3
      Jiang Xin authored
      Using a relative_path as git_dir first appears in v1.5.6-1-g044bbbcb.
      It will make git_dir shorter only if git_dir is inside work_tree,
      and this will increase performance. But my last refactor effort on
      relative_path function (commit v1.8.3-rc2-12-ge02ca72f) changed that.
      Always use relative_path as git_dir may bring troubles like
      Because new relative_path is a combination of original relative_path
      from path.c and original path_relative from quote.c, so in order to
      restore the origin implementation, save the original relative_path
      as remove_leading_path, and call it in setup.c.
      Suggested-by: default avatarKarsten Blees <[email protected]>
      Signed-off-by: Jiang Xin's avatarJiang Xin <[email protected]>
      Signed-off-by: default avatarJonathan Nieder <[email protected]>
  5. 20 Sep, 2013 1 commit
    • Jeff King's avatar
      format-patch: print in-body "From" only when needed · 662cc30c
      Jeff King authored
      Commit a9080475 taught format-patch the "--from" option,
      which places the author ident into an in-body from header,
      and uses the committer ident in the rfc822 from header.  The
      documentation claims that it will omit the in-body header
      when it is the same as the rfc822 header, but the code never
      implemented that behavior.
      This patch completes the feature by comparing the two idents
      and doing nothing when they are the same (this is the same
      as simply omitting the in-body header, as the two are by
      definition indistinguishable in this case). This makes it
      reasonable to turn on "--from" all the time (if it matches
      your particular workflow), rather than only using it when
      exporting other people's patches.
      Signed-off-by: default avatarJeff King <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
  6. 18 Sep, 2013 1 commit
  7. 17 Sep, 2013 1 commit
    • Eric Sunshine's avatar
      name-hash: refactor polymorphic index_name_exists() · db5360f3
      Eric Sunshine authored
      Depending upon the absence or presence of a trailing '/' on the incoming
      pathname, index_name_exists() checks either if a file is present in the
      index or if a directory is represented within the index. Each caller
      explicitly chooses the mode of operation by adding or removing a
      trailing '/' before invoking index_name_exists().
      Since these two modes of operations are disjoint and have no code in
      common (one searches index_state.name_hash; the other dir_hash), they
      can be represented more naturally as distinct functions: one to search
      for a file, and one for a directory.
      Splitting index searching into two functions relieves callers of the
      artificial burden of having to add or remove a slash to select the mode
      of operation; instead they just call the desired function. A subsequent
      patch will take advantage of this benefit in order to eliminate the
      requirement that the incoming pathname for a directory search must have
      a trailing slash.
      (In order to avoid disturbing in-flight topics, index_name_exists() is
      retained as a thin wrapper dispatching either to index_dir_exists() or
      Signed-off-by: Eric Sunshine's avatarEric Sunshine <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
  8. 09 Sep, 2013 1 commit
    • Jeff King's avatar
      git-config: always treat --int as 64-bit internally · 00160242
      Jeff King authored
      When you run "git config --int", the maximum size of integer
      you get depends on how git was compiled, and what it
      considers to be an "int".
      This is almost useful, because your scripts calling "git
      config" will behave similarly to git internally. But relying
      on this is dubious; you have to actually know how git treats
      each value internally (e.g., int versus unsigned long),
      which is not documented and is subject to change. And even
      if you know it is "unsigned long", we do not have a
      git-config option to match that behavior.
      Furthermore, you may simply be asking git to store a value
      on your behalf (e.g., configuration for a hook). In that
      case, the relevant range check has nothing at all to do with
      git, but rather with whatever scripting tools you are using
      (and git has no way of knowing what the appropriate range is
      Not only is the range check useless, but it is actively
      harmful, as there is no way at all for scripts to look
      at config variables with large values. For instance, one
      cannot reliably get the value of pack.packSizeLimit via
      git-config. On an LP64 system, git happily uses a 64-bit
      "unsigned long" internally to represent the value, but the
      script cannot read any value over 2G.
      Ideally, the "--int" option would simply represent an
      arbitrarily large integer. For practical purposes, however,
      a 64-bit integer is large enough, and is much easier to
      implement (and if somebody overflows it, we will still
      notice the problem, and not simply return garbage).
      Signed-off-by: default avatarJeff King <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
  9. 03 Sep, 2013 1 commit
  10. 20 Aug, 2013 1 commit
  11. 29 Jul, 2013 1 commit
  12. 17 Jul, 2013 1 commit
  13. 15 Jul, 2013 8 commits
  14. 12 Jul, 2013 3 commits
    • Jeff King's avatar
      sha1_object_info_extended: make type calculation optional · 5b086407
      Jeff King authored
      Each caller of sha1_object_info_extended sets up an
      object_info struct to tell the function which elements of
      the object it wants to get. Until now, getting the type of
      the object has always been required (and it is returned via
      the return type rather than a pointer in object_info).
      This can involve actually opening a loose object file to
      determine its type, or following delta chains to determine a
      packed file's base type. These effects produce a measurable
      slow-down when doing a "cat-file --batch-check" that does
      not include %(objecttype).
      This patch adds a "typep" query to struct object_info, so
      that it can be optionally queried just like size and
      disk_size. As a result, the return type of the function is
      no longer the object type, but rather 0/-1 for success/error.
      As there are only three callers total, we just fix up each
      caller rather than keep a compatibility wrapper:
        1. The simpler sha1_object_info wrapper continues to
           always ask for and return the type field.
        2. The istream_source function wants to know the type, and
           so always asks for it.
        3. The cat-file batch code asks for the type only when
           %(objecttype) is part of the format string.
      On linux.git, the best-of-five for running:
        $ git rev-list --objects --all >objects
        $ time git cat-file --batch-check='%(objectsize:disk)'
      on a fully packed repository goes from:
        real    0m8.680s
        user    0m8.160s
        sys     0m0.512s
        real    0m7.205s
        user    0m6.580s
        sys     0m0.608s
      Signed-off-by: default avatarJeff King <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
    • Jeff King's avatar
      cat-file: disable object/refname ambiguity check for batch mode · 25fba78d
      Jeff King authored
      A common use of "cat-file --batch-check" is to feed a list
      of objects from "rev-list --objects" or a similar command.
      In this instance, all of our input objects are 40-byte sha1
      ids. However, cat-file has always allowed arbitrary revision
      specifiers, and feeds the result to get_sha1().
      Fortunately, get_sha1() recognizes a 40-byte sha1 before
      doing any hard work trying to look up refs, meaning this
      scenario should end up spending very little time converting
      the input into an object sha1. However, since 798c35fc
      (get_sha1: warn about full or short object names that look
      like refs, 2013-05-29), when we encounter this case, we
      spend the extra effort to do a refname lookup anyway, just
      to print a warning. This is further exacerbated by ca919930
      (get_packed_ref_cache: reload packed-refs file when it
      changes, 2013-06-20), which makes individual ref lookup more
      expensive by requiring a stat() of the packed-refs file for
      each missing ref.
      With no patches, this is the time it takes to run:
        $ git rev-list --objects --all >objects
        $ time git cat-file --batch-check='%(objectname)' <objects
      on the linux.git repository:
        real    1m13.494s
        user    0m25.924s
        sys     0m47.532s
      If we revert ca919930, the packed-refs up-to-date check, it
      gets a little better:
        real    0m54.697s
        user    0m21.692s
        sys     0m32.916s
      but we are still spending quite a bit of time on ref lookup
      (and we would not want to revert that patch, anyway, which
      has correctness issues).  If we revert 798c35fc, disabling
      the warning entirely, we get a much more reasonable time:
        real    0m7.452s
        user    0m6.836s
        sys     0m0.608s
      This patch does the moral equivalent of this final case (and
      gets similar speedups). We introduce a global flag that
      callers of get_sha1() can use to avoid paying the price for
      the warning.
      Signed-off-by: default avatarJeff King <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
    • Heiko Voigt's avatar
      teach config --blob option to parse config from database · 1bc88819
      Heiko Voigt authored
      This can be used to read configuration values directly from git's
      database. For example it is useful for reading to be checked out
      .gitmodules files directly from the database.
      Signed-off-by: Heiko Voigt's avatarHeiko Voigt <[email protected]>
      Acked-by: default avatarJeff King <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
  15. 09 Jul, 2013 1 commit
    • Duy Nguyen's avatar
      Convert "struct cache_entry *" to "const ..." wherever possible · 9c5e6c80
      Duy Nguyen authored
      I attempted to make index_state->cache[] a "const struct cache_entry **"
      to find out how existing entries in index are modified and where. The
      question I have is what do we do if we really need to keep track of on-disk
      changes in the index. The result is
       - diff-lib.c: setting CE_UPTODATE
       - name-hash.c: setting CE_HASHED
       - preload-index.c, read-cache.c, unpack-trees.c and
         builtin/update-index: obvious
       - entry.c: write_entry() may refresh the checked out entry via
         fill_stat_cache_info(). This causes "non-const struct cache_entry
         *" in builtin/apply.c, builtin/checkout-index.c and
       - builtin/ls-files.c: --with-tree changes stagemask and may set
      Of these, write_entry() and its call sites are probably most
      interesting because it modifies on-disk info. But this is stat info
      and can be retrieved via refresh, at least for porcelain
      commands. Other just uses ce_flags for local purposes.
      So, keeping track of "dirty" entries is just a matter of setting a
      flag in index modification functions exposed by read-cache.c. Except
      unpack-trees, the rest of the code base does not do anything funny
      behind read-cache's back.
      The actual patch is less valueable than the summary above. But if
      anyone wants to re-identify the above sites. Applying this patch, then
          diff --git a/cache.h b/cache.h
          index 430d021..1692891 100644
          --- a/cache.h
          +++ b/cache.h
          @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
           #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
           struct index_state {
          -	struct cache_entry **cache;
          +	const struct cache_entry **cache;
           	unsigned int version;
           	unsigned int cache_nr, cache_alloc, cache_changed;
           	struct string_list *resolve_undo;
      will help quickly identify them without bogus warnings.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
  16. 08 Jul, 2013 1 commit
    • Junio C Hamano's avatar
      cache.h: move remote/connect API out of it · 47a59185
      Junio C Hamano authored
      The definition of "struct ref" in "cache.h", a header file so
      central to the system, always confused me.  This structure is not
      about the local ref used by sha1-name API to name local objects.
      It is what refspecs are expanded into, after finding out what refs
      the other side has, to define what refs are updated after object
      transfer succeeds to what values.  It belongs to "remote.h" together
      with "struct refspec".
      While we are at it, also move the types and functions related to the
      Git transport connection to a new header file connect.h
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
  17. 07 Jul, 2013 1 commit
    • Jeff King's avatar
      teach sha1_object_info_extended a "disk_size" query · 161f00e7
      Jeff King authored
      Using sha1_object_info_extended, a caller can find out the
      type of an object, its size, and information about where it
      is stored. In addition to the object's "true" size, it can
      also be useful to know the size that the object takes on
      disk (e.g., to generate statistics about which refs consume
      This patch adds a "disk_sizep" field to "struct object_info",
      and fills it in during sha1_object_info_extended if it is
      Signed-off-by: default avatarJeff King <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
  18. 26 Jun, 2013 1 commit
    • Jiang Xin's avatar
      path.c: refactor relative_path(), not only strip prefix · e02ca72f
      Jiang Xin authored
      Original design of relative_path() is simple, just strip the prefix
      (*base) from the absolute path (*abs).
      In most cases, we need a real relative path, such as: ../foo,
      ../../bar.  That's why there is another reimplementation
      (path_relative()) in quote.c.
      Borrow some codes from path_relative() in quote.c to refactor
      relative_path() in path.c, so that it could return real relative
      path, and user can reuse this function without reimplementing
      his/her own.  The function path_relative() in quote.c will be
      substituted, and I would use the new relative_path() function when
      implementing the interactive git-clean later.
      Different results for relative_path() before and after this refactor:
          abs path  base path  relative (original)  relative (refactor)
          ========  =========  ===================  ===================
          /a/b      /a/b       .                    ./
          /a/b/     /a/b       .                    ./
          /a        /a/b/      /a                   ../
          /         /a/b/      /                    ../../
          /a/c      /a/b/      /a/c                 ../c
          /x/y      /a/b/      /x/y                 ../../x/y
          a/b/      a/b/       .                    ./
          a/b/      a/b        .                    ./
          a         a/b        a                    ../
          x/y       a/b/       x/y                  ../../x/y
          a/c       a/b        a/c                  ../c
          (empty)   (null)     (empty)              ./
          (empty)   (empty)    (empty)              ./
          (empty)   /a/b       (empty)              ./
          (null)    (null)     (null)               ./
          (null)    (empty)    (null)               ./
          (null)    /a/b       (segfault)           ./
      You may notice that return value "." has been changed to "./".
      It is because:
       * Function quote_path_relative() in quote.c will show the relative
         path as "./" if abs(in) and base(prefix) are the same.
       * Function relative_path() is called only once (in setup.c), and
         it will be OK for the return value as "./" instead of ".".
      Signed-off-by: Jiang Xin's avatarJiang Xin <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
  19. 20 Jun, 2013 2 commits
  20. 09 Jun, 2013 1 commit
  21. 02 Jun, 2013 2 commits
  22. 12 May, 2013 1 commit
    • Jeff King's avatar
      refactor "ref->merge" flag · 900f2814
      Jeff King authored
      Each "struct ref" has a boolean flag that is set by the
      fetch code to determine whether the ref should be marked as
      "not-for-merge" or not when we write it out to FETCH_HEAD.
      It would be useful to turn this boolean into a tri-state,
      with the third state meaning "do not bother writing it out
      to FETCH_HEAD at all". That would let us add extra refs to
      the set of refs to be stored (e.g., to store copies of
      things we fetched) without impacting FETCH_HEAD.
      This patch turns it into an enum that covers the tri-state
      case, and hopefully makes the code more explicit and easier
      to read.
      Signed-off-by: default avatarJeff King <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
  23. 17 Apr, 2013 3 commits
  24. 05 Apr, 2013 1 commit
  25. 03 Apr, 2013 1 commit
    • Jonathan Nieder's avatar
      add -u: only show pathless 'add -u' warning when changes exist outside cwd · 71c7b053
      Jonathan Nieder authored
      A common workflow in large projects is to chdir into a subdirectory of
      interest and only do work there:
      	cd src
      	vi foo.c
      	make test
      	git add -u
      	git commit
      The upcoming change to 'git add -u' behavior would not affect such a
      workflow: when the only changes present are in the current directory,
      'git add -u' will add all changes, and whether that happens via an
      implicit "." or implicit ":/" parameter is an unimportant
      implementation detail.
      The warning about use of 'git add -u' with no pathspec is annoying
      because it seemingly serves no purpose in this case.  So suppress the
      warning unless there are changes outside the cwd that are not being
      A previous version of this patch ran two I/O-intensive diff-files
      passes: one to find changes outside the cwd, and another to find
      changes to add to the index within the cwd.  This version runs one
      full-tree diff and decides for each change whether to add it or warn
      and suppress it in update_callback.  As a result, even on very large
      repositories "git add -u" will not be significantly slower than the
      future default behavior ("git add -u :/"), and the slowdown relative
      to "git add -u ." should be a useful clue to users of such
      repositories to get into the habit of explicitly passing '.'.
      Signed-off-by: default avatarJonathan Nieder <[email protected]>
      Acked-by: default avatarJeff King <[email protected]>
      Improved-by: default avatarJunio C Hamano <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
  26. 27 Mar, 2013 1 commit
    • Duy Nguyen's avatar
      checkout: avoid unnecessary match_pathspec calls · e721c154
      Duy Nguyen authored
      In checkout_paths() we do this
       - for all updated items, call match_pathspec
       - for all items, call match_pathspec (inside unmerge_cache)
       - for all items, call match_pathspec (for showing "path .. is unmerged)
       - for updated items, call match_pathspec and update paths
      That's a lot of duplicate match_pathspec(s) and the function is not
      exactly cheap to be called so many times, especially on large indexes.
      This patch makes it call match_pathspec once per updated index entry,
      save the result in ce_flags and reuse the results in the following
      The changes in 0a1283bc (checkout $tree $path: do not clobber local
      changes in $path not in $tree - 2011-09-30) limit the affected paths
      to ones we read from $tree. We do not do anything to other modified
      entries in this case, so the "for all items" above could be modified
      to "for all updated items". But..
      The command's behavior now is modified slightly: unmerged entries that
      match $path, but not updated by $tree, are now NOT touched.  Although
      this should be considered a bug fix, not a regression. A new test is
      added for this change.
      And while at there, free ps_matched after use.
      The following command is tested on webkit, 215k entries. The pattern
      is chosen mainly to make match_pathspec sweat:
      git checkout -- "*[a-zA-Z]*[a-zA-Z]*[a-zA-Z]*"
              before      after
      real    0m3.493s    0m2.737s
      user    0m2.239s    0m1.586s
      sys     0m1.252s    0m1.151s
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
  27. 17 Mar, 2013 1 commit
    • René Scharfe's avatar
      archive-zip: use deflateInit2() to ask for raw compressed data · c3c2e1a0
      René Scharfe authored
      We use the function git_deflate_init() -- which wraps the zlib function
      deflateInit() -- to initialize compression of ZIP file entries.  This
      results in compressed data prefixed with a two-bytes long header and
      followed by a four-bytes trailer.  ZIP file entries consist of ZIP
      headers and raw compressed data instead, so we remove the zlib wrapper
      before writing the result.
      We can ask zlib for the the raw compressed data without the unwanted
      parts in the first place by using deflateInit2() and specifying a
      negative number of bits to size the window.  For that purpose, factor
      out the function do_git_deflate_init() and add git_deflate_init_raw(),
      which wraps it.  Then use the latter in archive-zip.c and get rid of
      the code that stripped the zlib header and trailer.
      Also rename the helper function zlib_deflate() to zlib_deflate_raw()
      to reflect the change.
      Thus we avoid generating data that we throw away anyway, the code
      becomes shorter and some magic constants are removed.
      Signed-off-by: default avatarRene Scharfe <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>