1. 22 May, 2018 2 commits
    • Johannes Schindelin's avatar
      is_ntfs_dotgit: match other .git files · e7cb0b44
      Johannes Schindelin authored
      When we started to catch NTFS short names that clash with .git, we only
      looked for GIT~1. This is sufficient because we only ever clone into an
      empty directory, so .git is guaranteed to be the first subdirectory or
      file in that directory.
      However, even with a fresh clone, .gitmodules is *not* necessarily the
      first file to be written that would want the NTFS short name GITMOD~1: a
      malicious repository can add .gitmodul0000 and friends, which sorts
      before `.gitmodules` and is therefore checked out *first*. For that
      reason, we have to test not only for ~1 short names, but for others,
      It's hard to just adapt the existing checks in is_ntfs_dotgit(): since
      Windows 2000 (i.e., in all Windows versions still supported by Git),
      NTFS short names are only generated in the <prefix>~<number> form up to
      number 4. After that, a *different* prefix is used, calculated from the
      long file name using an undocumented, but stable algorithm.
      For example, the short name of .gitmodules would be GITMOD~1, but if it
      is taken, and all of ~2, ~3 and ~4 are taken, too, the short name
      GI7EBA~1 will be used. From there, collisions are handled by
      incrementing the number, shortening the prefix as needed (until ~9999999
      is reached, in which case NTFS will not allow the file to be created).
      We'd also want to handle .gitignore and .gitattributes, which suffer
      from a similar problem, using the fall-back short names GI250A~1 and
      GI7D29~1, respectively.
      To accommodate for that, we could reimplement the hashing algorithm, but
      it is just safer and simpler to provide the known prefixes. This
      algorithm has been reverse-engineered and described at
      https://usn.pw/blog/gen/2015/06/09/filenames/, which is defunct but
      still available via https://web.archive.org/.
      These can be recomputed by running the following Perl script:
      -- snip --
      use warnings;
      use strict;
      sub compute_short_name_hash ($) {
              my $checksum = 0;
              foreach (split('', $_[0])) {
                      $checksum = ($checksum * 0x25 + ord($_)) & 0xffff;
              $checksum = ($checksum * 314159269) & 0xffffffff;
              $checksum = 1 + (~$checksum & 0x7fffffff) if ($checksum & 0x80000000);
              $checksum -= (($checksum * 1152921497) >> 60) * 1000000007;
              return scalar reverse sprintf("%x", $checksum & 0xffff);
      print compute_short_name_hash($ARGV[0]);
      -- snap --
      E.g., running that with the argument ".gitignore" will
      result in "250a" (which then becomes "gi250a" in the code).
      Signed-off-by: Johannes Schindelin's avatarJohannes Schindelin <johannes.schindelin@gmx.de>
      Signed-off-by: default avatarJeff King <peff@peff.net>
    • Jeff King's avatar
      is_ntfs_dotgit: use a size_t for traversing string · 11a9f4d8
      Jeff King authored
      We walk through the "name" string using an int, which can
      wrap to a negative value and cause us to read random memory
      before our array (e.g., by creating a tree with a name >2GB,
      since "int" is still 32 bits even on most 64-bit platforms).
      Worse, this is easy to trigger during the fsck_tree() check,
      which is supposed to be protecting us from malicious
      Note one bit of trickiness in the existing code: we
      sometimes assign -1 to "len" at the end of the loop, and
      then rely on the "len++" in the for-loop's increment to take
      it back to 0. This is still legal with a size_t, since
      assigning -1 will turn into SIZE_MAX, which then wraps
      around to 0 on increment.
      Signed-off-by: default avatarJeff King <peff@peff.net>
  2. 23 Mar, 2018 1 commit
    • Stefan Beller's avatar
      repository: introduce raw object store field · 90c62155
      Stefan Beller authored
      The raw object store field will contain any objects needed for access
      to objects in a given repository.
      This patch introduces the raw object store and populates it with the
      `objectdir`, which used to be part of the repository struct.
      As the struct gains members, we'll also populate the function to clear
      the memory for these members.
      In a later step, we'll introduce a struct object_parser, that will
      complement the object parsing in a repository struct: The raw object
      parser is the layer that will provide access to raw object content,
      while the higher level object parser code will parse raw objects and
      keeps track of parenthood and other object relationships using 'struct
      object'.  For now only add the lower level to the repository struct.
      Signed-off-by: Stefan Beller's avatarStefan Beller <sbeller@google.com>
      Signed-off-by: default avatarJonathan Nieder <jrnieder@gmail.com>
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  3. 25 Oct, 2017 1 commit
  4. 04 Oct, 2017 1 commit
    • Jeff King's avatar
      path.c: fix uninitialized memory access · 8262715b
      Jeff King authored
      In cleanup_path we're passing in a char array, run a memcmp on it, and
      run through it without ever checking if something is in the array in the
      first place.  This can lead us to access uninitialized memory, for
      example in t5541-http-push-smart.sh test 7, when run under valgrind:
      ==4423== Conditional jump or move depends on uninitialised value(s)
      ==4423==    at 0x242FA9: cleanup_path (path.c:35)
      ==4423==    by 0x242FA9: mkpath (path.c:456)
      ==4423==    by 0x256CC7: refname_match (refs.c:364)
      ==4423==    by 0x26C181: count_refspec_match (remote.c:1015)
      ==4423==    by 0x26C181: match_explicit_lhs (remote.c:1126)
      ==4423==    by 0x26C181: check_push_refs (remote.c:1409)
      ==4423==    by 0x2ABB4D: transport_push (transport.c:870)
      ==4423==    by 0x186703: push_with_options (push.c:332)
      ==4423==    by 0x18746D: do_push (push.c:409)
      ==4423==    by 0x18746D: cmd_push (push.c:566)
      ==4423==    by 0x1183E0: run_builtin (git.c:352)
      ==4423==    by 0x11973E: handle_builtin (git.c:539)
      ==4423==    by 0x11973E: run_argv (git.c:593)
      ==4423==    by 0x11973E: main (git.c:698)
      ==4423==  Uninitialised value was created by a heap allocation
      ==4423==    at 0x4C2CD8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
      ==4423==    by 0x4C2F195: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
      ==4423==    by 0x2C196B: xrealloc (wrapper.c:137)
      ==4423==    by 0x29A30B: strbuf_grow (strbuf.c:66)
      ==4423==    by 0x29A30B: strbuf_vaddf (strbuf.c:277)
      ==4423==    by 0x242F9F: mkpath (path.c:454)
      ==4423==    by 0x256CC7: refname_match (refs.c:364)
      ==4423==    by 0x26C181: count_refspec_match (remote.c:1015)
      ==4423==    by 0x26C181: match_explicit_lhs (remote.c:1126)
      ==4423==    by 0x26C181: check_push_refs (remote.c:1409)
      ==4423==    by 0x2ABB4D: transport_push (transport.c:870)
      ==4423==    by 0x186703: push_with_options (push.c:332)
      ==4423==    by 0x18746D: do_push (push.c:409)
      ==4423==    by 0x18746D: cmd_push (push.c:566)
      ==4423==    by 0x1183E0: run_builtin (git.c:352)
      ==4423==    by 0x11973E: handle_builtin (git.c:539)
      ==4423==    by 0x11973E: run_argv (git.c:593)
      ==4423==    by 0x11973E: main (git.c:698)
      Avoid this by using skip_prefix(), which knows not to go beyond the
      end of the string.
      Reported-by: default avatarThomas Gummerer <t.gummerer@gmail.com>
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Reviewed-by: default avatarJonathan Nieder <jrnieder@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  5. 02 Oct, 2017 1 commit
  6. 27 Sep, 2017 3 commits
    • Jeff King's avatar
      validate_headref: use get_oid_hex for detached HEADs · 0bca165f
      Jeff King authored
      If a candidate HEAD isn't a symref, we check that it
      contains a viable sha1. But in a post-sha1 world, we should
      be checking whether it has any plausible object-id.
      We can do that by switching to get_oid_hex().
      Note that both before and after this patch, we only check
      for a plausible object id at the start of the file, and then
      call that good enough.  We ignore any content _after_ the
      hex, so a string like:
        0123456789012345678901234567890123456789 foo
      is accepted. Though we do put extra bytes like this into
      some pseudorefs (e.g., FETCH_HEAD), we don't typically do so
      with HEAD. We could tighten this up by using parse_oid_hex(),
        if (!parse_oid_hex(buffer, &oid, &end) &&
            *end++ == '\n' && *end == '\0')
                return 0;
      But we're probably better to remain on the loose side. We're
      just checking here for a plausible-looking repository
      directory, so heuristics are acceptable (if we really want
      to be meticulous, we should use the actual ref code to parse
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    • Jeff King's avatar
      validate_headref: use skip_prefix for symref parsing · 7eb4b9d0
      Jeff King authored
      Since the previous commit guarantees that our symref buffer
      is NUL-terminated, we can just use skip_prefix() and friends
      to parse it. This is shorter and saves us having to deal
      with magic numbers and keeping the "len" counter up to date.
      While we're at it, let's name the rather obscure "buf" to
      "refname", since that is the thing we are parsing with it.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    • Jeff King's avatar
      validate_headref: NUL-terminate HEAD buffer · 6e68c914
      Jeff King authored
      When we are checking to see if we have a git repo, we peek
      into the HEAD file and see if it's a plausible symlink,
      symref, or detached HEAD.
      For the latter two, we read the contents with read_in_full(),
      which means they aren't NUL-terminated. The symref check is
      careful to respect the length we got, but the sha1 check
      will happily parse up to 40 bytes, even if we read fewer.
        echo 1234 >.git/HEAD
        git rev-parse
      will parse 36 uninitialized bytes from our stack buffer.
      This isn't a big deal in practice. Our buffer is 256 bytes,
      so we know we'll never read outside of it. The worst case is
      that the uninitialized bytes look like valid hex, and we
      claim a bogus HEAD file is valid. The chances of this
      happening randomly are quite slim, but let's be careful.
      One option would be to check that "len == 41" before feeding
      the buffer to get_sha1_hex(). But we'd like to eventually
      prepare for a world with variable-length hashes. Let's
      NUL-terminate as soon as we've read the buffer (we already
      even leave a spare byte to do so!). That fixes this problem
      without depending on the size of an object id.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  7. 23 Aug, 2017 1 commit
  8. 28 Jul, 2017 1 commit
  9. 24 Jun, 2017 8 commits
  10. 15 Apr, 2017 1 commit
  11. 27 Mar, 2017 1 commit
  12. 13 Mar, 2017 1 commit
  13. 16 Dec, 2016 1 commit
    • Johannes Sixt's avatar
      normalize_path_copy(): fix pushing to //server/share/dir on Windows · 7814fbe3
      Johannes Sixt authored
      normalize_path_copy() is not prepared to keep the double-slash of a
      //server/share/dir kind of path, but treats it like a regular POSIX
      style path and transforms it to /server/share/dir.
      The bug manifests when 'git push //server/share/dir master' is run,
      because tmp_objdir_add_as_alternate() uses the path in normalized
      form when it registers the quarantine object database via
      link_alt_odb_entries(). Needless to say that the directory cannot be
      accessed using the wrongly normalized path.
      Fix it by skipping all of the root part, not just a potential drive
      prefix. offset_1st_component takes care of this, see the
      implementation in compat/mingw.c::mingw_offset_1st_component().
      Signed-off-by: default avatarJohannes Sixt <j6t@kdbg.org>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  14. 26 Oct, 2016 1 commit
    • René Scharfe's avatar
      hex: make wraparound of the index into ring-buffer explicit · bb84735c
      René Scharfe authored
      Overflow is defined for unsigned integers, but not for signed ones.
      We could make the ring-buffer index in sha1_to_hex() and
      get_pathname() unsigned to be on the safe side to resolve this, but
      let's make it explicit that we are wrapping around at whatever the
      number of elements the ring-buffer has.  The compiler is smart enough
      to turn modulus into bitmask for these codepaths that use
      ring-buffers of a size that is a power of 2.
      Signed-off-by: default avatarRené Scharfe <l.s.r@web.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  15. 01 Sep, 2016 1 commit
    • Jacob Keller's avatar
      allow do_submodule_path to work even if submodule isn't checked out · 99b43a61
      Jacob Keller authored
      Currently, do_submodule_path will attempt locating the .git directory by
      using read_gitfile on <path>/.git. If this fails it just assumes the
      <path>/.git is actually a git directory.
      This is good because it allows for handling submodules which were cloned
      in a regular manner first before being added to the superproject.
      Unfortunately this fails if the <path> is not actually checked out any
      longer, such as by removing the directory.
      Fix this by checking if the directory we found is actually a gitdir. In
      the case it is not, attempt to lookup the submodule configuration and
      find the name of where it is stored in the .git/modules/ directory of
      the superproject.
      If we can't locate the submodule configuration, this might occur because
      for example a submodule gitlink was added but the corresponding
      .gitmodules file was not properly updated.  A die() here would not be
      pleasant to the users of submodule diff formats, so instead, modify
      do_submodule_path() to return an error code:
       - git_pathdup_submodule() returns NULL when we fail to find a path.
       - strbuf_git_path_submodule() propagates the error code to the caller.
      Modify the callers of these functions to check the error code and fail
      properly. This ensures we don't attempt to use a bad path that doesn't
      match the corresponding submodule.
      Because this change fixes add_submodule_odb() to work even if the
      submodule is not checked out, update the wording of the submodule log
      diff format to correctly display that the submodule is "not initialized"
      instead of "not checked out"
      Add tests to ensure this change works as expected.
      Signed-off-by: default avatarJacob Keller <jacob.keller@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  16. 16 Aug, 2016 1 commit
    • Johannes Schindelin's avatar
      rev-parse: respect core.hooksPath in --git-path · 9445b492
      Johannes Schindelin authored
      The idea of the --git-path option is not only to avoid having to
      prefix paths with the output of --git-dir all the time, but also to
      respect overrides for specific common paths inside the .git directory
      (e.g. `git rev-parse --git-path objects` will report the value of the
      environment variable GIT_OBJECT_DIRECTORY, if set).
      When introducing the core.hooksPath setting, we forgot to adjust
      git_path() accordingly. This patch fixes that.
      While at it, revert the special-casing of core.hooksPath in
      run-command.c, as it is now no longer needed.
      Signed-off-by: Johannes Schindelin's avatarJohannes Schindelin <johannes.schindelin@gmx.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  17. 19 Jul, 2016 1 commit
  18. 06 May, 2016 1 commit
    • Li Peng's avatar
      typofix: assorted typofixes in comments, documentation and messages · 832c0e5e
      Li Peng authored
      Many instances of duplicate words (e.g. "the the path") and
      a few typoes are fixed, originally in multiple patches.
          wildmatch: fix duplicate words of "the"
          t: fix duplicate words of "output"
          transport-helper: fix duplicate words of "read"
          Git.pm: fix duplicate words of "return"
          path: fix duplicate words of "look"
          pack-protocol.txt: fix duplicate words of "the"
          precompose-utf8: fix typo of "sequences"
          split-index: fix typo
          worktree.c: fix typo
          remote-ext: fix typo
          utf8: fix duplicate words of "the"
          git-cvsserver: fix duplicate words
      Signed-off-by: default avatarLi Peng <lip@dtdream.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  19. 22 Apr, 2016 2 commits
  20. 23 Mar, 2016 1 commit
  21. 11 Mar, 2016 1 commit
  22. 12 Jan, 2016 1 commit
    • Johannes Schindelin's avatar
      Refactor skipping DOS drive prefixes · 2f36eed9
      Johannes Schindelin authored
      Junio noticed that there is an implicit assumption in pretty much
      all the code calling has_dos_drive_prefix(): it forces all of its
      callsites to hardcode the knowledge that the DOS drive prefix is
      always two bytes long.
      While this assumption is pretty safe, we can still make the code
      more readable and less error-prone by introducing a function that
      skips the DOS drive prefix safely.
      While at it, we change the has_dos_drive_prefix() return value: it
      now returns the number of bytes to be skipped if there is a DOS
      drive prefix.
      Signed-off-by: Johannes Schindelin's avatarJohannes Schindelin <johannes.schindelin@gmx.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  23. 20 Nov, 2015 1 commit
  24. 09 Oct, 2015 1 commit
    • Ray Donnelly's avatar
      test-path-utils.c: remove incorrect assumption · b2a7123b
      Ray Donnelly authored
      In normalize_ceiling_entry(), we test that normalized paths end with
      slash, *unless* the path to be normalized was already the root
      However, normalize_path_copy() does not even enforce this condition.
      Even worse: on Windows, the root directory gets translated into a
      Windows directory by the Bash before being passed to `git.exe` (or
      `test-path-utils.exe`), which means that we cannot even know whether
      the path that was passed to us was the root directory to begin with.
      This issue has already caused endless hours of trying to "fix" the
      MSYS2 runtime, only to break other things due to MSYS2 ensuring that
      the converted path maintains the same state as the input path with
      respect to any final '/'.
      So let's just forget about this test. It is non-essential to Git's
      operation, anyway.
      Acked-by: Johannes Schindelin's avatarJohannes Schindelin <johannes.schindelin@gmx.de>
      Signed-off-by: Ray Donnelly's avatarRay Donnelly <mingw.android@gmail.com>
  25. 05 Oct, 2015 3 commits
    • Jeff King's avatar
      use strbuf_complete to conditionally append slash · 00b6c178
      Jeff King authored
      When working with paths in strbufs, we frequently want to
      ensure that a directory contains a trailing slash before
      appending to it. We can shorten this code (and make the
      intent more obvious) by calling strbuf_complete.
      Most of these cases are trivially identical conversions, but
      there are two things to note:
        - in a few cases we did not check that the strbuf is
          non-empty (which would lead to an out-of-bounds memory
          access). These were generally not triggerable in
          practice, either from earlier assertions, or typically
          because we would have just fed the strbuf to opendir(),
          which would choke on an empty path.
        - in a few cases we indexed the buffer with "original_len"
          or similar, rather than the current sb->len, and it is
          not immediately obvious from the diff that they are the
          same. In all of these cases, I manually verified that
          the strbuf does not change between the assignment and
          the strbuf_complete call.
      This does not convert cases which look like:
        if (sb->len && !is_dir_sep(sb->buf[sb->len - 1]))
      	  strbuf_addch(sb, '/');
      as those are obviously semantically different. Some of these
      cases arguably should be doing that, but that is out of
      scope for this change, which aims purely for cleanup with no
      behavior change (and at least it will make such sites easier
      to find and examine in the future, as we can grep for
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    • Jeff King's avatar
      remove_leading_path: use a strbuf for internal storage · 46357688
      Jeff King authored
      This function strcpy's directly into a PATH_MAX-sized
      buffer. There's only one caller, which feeds the git_dir into
      it, so it's not easy to trigger in practice (even if you fed
      a large $GIT_DIR through the environment or .git file, it
      would have to actually exist and be accessible on the
      filesystem to get to this point). We can fix it by moving to
      a strbuf.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    • Jeff King's avatar
      enter_repo: convert fixed-size buffers to strbufs · e9ba6781
      Jeff King authored
      We use two PATH_MAX-sized buffers to represent the repo
      path, and must make sure not to overflow them. We do take
      care to check the lengths, but the logic is rather hard to
      follow, as we use several magic numbers (e.g., "PATH_MAX -
      10"). And in fact you _can_ overflow the buffer if you have
      a ".git" file with an extremely long path in it.
      By switching to strbufs, these problems all go away. We do,
      however, retain the check that the initial input we get is
      no larger than PATH_MAX. This function is an entry point for
      untrusted repo names from the network, and it's a good idea
      to keep a sanity check (both to avoid allocating arbitrary
      amounts of memory, and also as a layer of defense against
      any downstream users of the names).
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
  26. 28 Sep, 2015 2 commits