1. 17 Dec, 2008 1 commit
  2. 11 Dec, 2008 1 commit
  3. 28 Nov, 2008 2 commits
    • Sam Vilain's avatar
      sha1_file.c: resolve confusion EACCES vs EPERM · 35243577
      Sam Vilain authored
      An earlier commit 916d081b (Nicer error messages in case saving an object
      to db goes wrong, 2006-11-09) confused EACCES with EPERM, the latter of
      which is an unlikely error from mkstemp().
      Signed-off-by: Sam Vilain's avatarSam Vilain <[email protected]>
      35243577
    • Joey Hess's avatar
      sha1_file: avoid bogus "file exists" error message · 65117abc
      Joey Hess authored
      This avoids the following misleading error message:
      
      error: unable to create temporary sha1 filename ./objects/15: File exists
      
      mkstemp can fail for many reasons, one of which, ENOENT, can occur if
      the directory for the temp file doesn't exist. create_tmpfile tried to
      handle this case by always trying to mkdir the directory, even if it
      already existed. This caused errno to be clobbered, so one cannot tell
      why mkstemp really failed, and it truncated the buffer to just the
      directory name, resulting in the strange error message shown above.
      
      Note that in both occasions that I've seen this failure, it has not been
      due to a missing directory, or bad permissions, but some other, unknown
      mkstemp failure mode that did not occur when I ran git again. This code
      could perhaps be made more robust by retrying mkstemp, in case it was a
      transient failure.
      Signed-off-by: default avatarJoey Hess <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      65117abc
  4. 24 Nov, 2008 2 commits
  5. 12 Nov, 2008 3 commits
  6. 02 Nov, 2008 5 commits
    • Nicolas Pitre's avatar
      make find_pack_revindex() aware of the nasty world · 08698b1e
      Nicolas Pitre authored
      It currently calls die() whenever given offset is not found thinking
      that such thing should never happen.  But this offset may come from a
      corrupted pack whych _could_ happen and not be found.  Callers should
      deal with this possibility gracefully instead.
      Signed-off-by: default avatarNicolas Pitre <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      08698b1e
    • Nicolas Pitre's avatar
      make packed_object_info() resilient to pack corruptions · 3d77d877
      Nicolas Pitre authored
      In the same spirit as commit 8eca0b47, let's try to survive a pack
      corruption by making packed_object_info() able to fall back to alternate
      packs or loose objects.
      Signed-off-by: default avatarNicolas Pitre <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      3d77d877
    • Nicolas Pitre's avatar
      make unpack_object_header() non fatal · 09ded04b
      Nicolas Pitre authored
      It is possible to have pack corruption in the object header.  Currently
      unpack_object_header() simply die() on them instead of letting the caller
      deal with that gracefully.
      
      So let's have unpack_object_header() return an error instead, and find
      a better name for unpack_object_header_gently() in that context.  All
      callers of unpack_object_header() are ready for it.
      Signed-off-by: default avatarNicolas Pitre <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      09ded04b
    • Nicolas Pitre's avatar
      better validation on delta base object offsets · d8f32556
      Nicolas Pitre authored
      In one case, it was possible to have a bad offset equal to 0 effectively
      pointing a delta onto itself and crashing git after too many recursions.
      In the other cases, a negative offset could result due to off_t being
      signed.  Catch those.
      Signed-off-by: default avatarNicolas Pitre <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      d8f32556
    • Nicolas Pitre's avatar
      close another possibility for propagating pack corruption · 0e8189e2
      Nicolas Pitre authored
      Abstract
      --------
      
      With index v2 we have a per object CRC to allow quick and safe reuse of
      pack data when repacking.  This, however, doesn't currently prevent a
      stealth corruption from being propagated into a new pack when _not_
      reusing pack data as demonstrated by the modification to t5302 included
      here.
      
      The Context
      -----------
      
      The Git database is all checksummed with SHA1 hashes.  Any kind of
      corruption can be confirmed by verifying this per object hash against
      corresponding data.  However this can be costly to perform systematically
      and therefore this check is often not performed at run time when
      accessing the object database.
      
      First, the loose object format is entirely compressed with zlib which
      already provide a CRC verification of its own when inflating data.  Any
      disk corruption would be caught already in this case.
      
      Then, packed objects are also compressed with zlib but only for their
      actual payload.  The object headers and delta base references are not
      deflated for obvious performance reasons, however this leave them
      vulnerable to potentially undetected disk corruptions.  Object types
      are often validated against the expected type when they're requested,
      and deflated size must always match the size recorded in the object header,
      so those cases are pretty much covered as well.
      
      Where corruptions could go unnoticed is in the delta base reference.
      Of course, in the OBJ_REF_DELTA case,  the odds for a SHA1 reference to
      get corrupted so it actually matches the SHA1 of another object with the
      same size (the delta header stores the expected size of the base object
      to apply against) are virtually zero.  In the OBJ_OFS_DELTA case, the
      reference is a pack offset which would have to match the start boundary
      of a different base object but still with the same size, and although this
      is relatively much more "probable" than in the OBJ_REF_DELTA case, the
      probability is also about zero in absolute terms.  Still, the possibility
      exists as demonstrated in t5302 and is certainly greater than a SHA1
      collision, especially in the OBJ_OFS_DELTA case which is now the default
      when repacking.
      
      Again, repacking by reusing existing pack data is OK since the per object
      CRC provided by index v2 guards against any such corruptions. What t5302
      failed to test is a full repack in such case.
      
      The Solution
      ------------
      
      As unlikely as this kind of stealth corruption can be in practice, it
      certainly isn't acceptable to propagate it into a freshly created pack.
      But, because this is so unlikely, we don't want to pay the run time cost
      associated with extra validation checks all the time either.  Furthermore,
      consequences of such corruption in anything but repacking should be rather
      visible, and even if it could be quite unpleasant, it still has far less
      severe consequences than actively creating bad packs.
      
      So the best compromize is to check packed object CRC when unpacking
      objects, and only during the compression/writing phase of a repack, and
      only when not streaming the result.  The cost of this is minimal (less
      than 1% CPU time), and visible only with a full repack.
      
      Someone with a stats background could provide an objective evaluation of
      this, but I suspect that it's bad RAM that has more potential for data
      corruptions at this point, even in those cases where this extra check
      is not performed.  Still, it is best to prevent a known hole for
      corruption when recreating object data into a new pack.
      
      What about the streamed pack case?  Well, any client receiving a pack
      must always consider that pack as untrusty and perform full validation
      anyway, hence no such stealth corruption could be propagated to remote
      repositoryes already.  It is therefore worthless doing local validation
      in that case.
      Signed-off-by: default avatarNicolas Pitre <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      0e8189e2
  7. 19 Oct, 2008 1 commit
  8. 18 Oct, 2008 1 commit
  9. 12 Oct, 2008 1 commit
  10. 09 Oct, 2008 1 commit
  11. 03 Oct, 2008 1 commit
    • Nicolas Pitre's avatar
      fix openssl headers conflicting with custom SHA1 implementations · 9126f009
      Nicolas Pitre authored
      On ARM I have the following compilation errors:
      
          CC fast-import.o
      In file included from cache.h:8,
                       from builtin.h:6,
                       from fast-import.c:142:
      arm/sha1.h:14: error: conflicting types for 'SHA_CTX'
      /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here
      arm/sha1.h:16: error: conflicting types for 'SHA1_Init'
      /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here
      arm/sha1.h:17: error: conflicting types for 'SHA1_Update'
      /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here
      arm/sha1.h:18: error: conflicting types for 'SHA1_Final'
      /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here
      make: *** [fast-import.o] Error 1
      
      This is because openssl header files are always included in
      git-compat-util.h since commit 684ec6c6 whenever NO_OPENSSL is not
      set, which somehow brings in <openssl/sha1.h> clashing with the custom
      ARM version.  Compilation of git is probably broken on PPC too for the
      same reason.
      
      Turns out that the only file requiring openssl/ssl.h and openssl/err.h
      is imap-send.c.  But only moving those problematic includes there
      doesn't solve the issue as it also includes cache.h which brings in the
      conflicting local SHA1 header file.
      
      As suggested by Jeff King, the best solution is to rename our references
      to SHA1 functions and structure to something git specific, and define those
      according to the implementation used.
      Signed-off-by: default avatarNicolas Pitre <[email protected]>
      Signed-off-by: default avatarShawn O. Pearce <[email protected]>
      9126f009
  12. 19 Sep, 2008 1 commit
  13. 09 Sep, 2008 2 commits
    • Junio C Hamano's avatar
      push: receiver end advertises refs from alternate repositories · d79796bc
      Junio C Hamano authored
      Earlier, when pushing into a repository that borrows from alternate object
      stores, we followed the longstanding design decision not to trust refs in
      the alternate repository that houses the object store we are borrowing
      from.  If your public repository is borrowing from Linus's public
      repository, you pushed into it long time ago, and now when you try to push
      your updated history that is in sync with more recent history from Linus,
      you will end up sending not just your own development, but also the
      changes you acquired through Linus's tree, even though the objects needed
      for the latter already exists at the receiving end.  This is because the
      receiving end does not advertise that the objects only reachable from the
      borrowed repository (i.e. Linus's) are already available there.
      
      This solves the issue by making the receiving end advertise refs from
      borrowed repositories.  They are not sent with their true names but with a
      phoney name ".have" to make sure that the old senders will safely ignore
      them (otherwise, the old senders will misbehave, trying to push matching
      refs, and mirror push that deletes refs that only exist at the receiving
      end).
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      d79796bc
    • Junio C Hamano's avatar
      is_directory(): a generic helper function · 90b4a71c
      Junio C Hamano authored
      A simple "grep -e stat --and -e S_ISDIR" revealed there are many
      open-coded implementations of this function.
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      90b4a71c
  14. 04 Sep, 2008 1 commit
    • Junio C Hamano's avatar
      safe_create_leading_directories(): make it about "leading" directories · 5f0bdf50
      Junio C Hamano authored
      We used to allow callers to pass "foo/bar/" to make sure both "foo" and
      "foo/bar" exist and have good permissions, but this interface is too error
      prone.  If a caller mistakenly passes a path with trailing slashes
      (perhaps it forgot to verify the user input) even when it wants to later
      mkdir "bar" itself, it will find that it cannot mkdir "bar".  If such a
      caller does not bother to check the error for EEXIST, it may even
      errorneously die().
      
      Because we have no existing callers to use that obscure feature, this
      patch removes it to avoid confusion.
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      5f0bdf50
  15. 23 Aug, 2008 1 commit
  16. 06 Aug, 2008 1 commit
  17. 03 Aug, 2008 1 commit
    • Dmitry Potapov's avatar
      teach index_fd to work with pipes · 43df4f86
      Dmitry Potapov authored
      index_fd can now work with file descriptors that are not normal files
      but any readable file. If the given file descriptor is a regular file
      then mmap() is used; for other files, strbuf_read is used.
      
      The path parameter, which has been used as hint for filters, can be
      NULL now to indicate that the file should be hashed literally without
      any filter.
      
      The index_pipe function is removed as redundant.
      Signed-off-by: default avatarDmitry Potapov <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      43df4f86
  18. 15 Jul, 2008 1 commit
    • Nicolas Pitre's avatar
      restore legacy behavior for read_sha1_file() · ac939109
      Nicolas Pitre authored
      Since commit 8eca0b47, it is possible
      for read_sha1_file() to return NULL even with existing objects when they
      are corrupted.  Previously a corrupted object would have terminated the
      program immediately, effectively making read_sha1_file() return NULL
      only when specified object is not found.
      
      Let's restore this behavior for all users of read_sha1_file() and
      provide a separate function with the ability to not terminate when
      bad objects are encountered.
      Signed-off-by: default avatarNicolas Pitre <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      ac939109
  19. 09 Jul, 2008 1 commit
    • Shawn O. Pearce's avatar
      Correct pack memory leak causing git gc to try to exceed ulimit · eac12e2d
      Shawn O. Pearce authored
      When recursing to unpack a delta base we must unuse_pack() so that
      the pack window for the current object does not remain pinned in
      memory while the delta base is itself being unpacked and materialized
      for our use.
      
      On a long delta chain of 50 objects we may need to access 6 different
      windows from a very large (>3G) pack file in order to obtain all
      of the delta base content.  If the process ulimit permits us to
      map/allocate only 1.5G we must release windows during this recursion
      to ensure we stay within the ulimit and transition memory from pack
      cache to standard malloc, or other mmap needs.
      
      Inserting an unuse_pack() call prior to the recursion allows us to
      avoid pinning the current window, making it available for garbage
      collection if memory runs low.
      
      This has been broken since at least before 1.5.1-rc1, and very
      likely earlier than that.  Its fixed now.  :)
      Signed-off-by: default avatarShawn O. Pearce <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      eac12e2d
  20. 06 Jul, 2008 1 commit
  21. 25 Jun, 2008 4 commits
    • Jeff King's avatar
      clone: create intermediate directories of destination repo · 2beebd22
      Jeff King authored
      The shell version used to use "mkdir -p" to create the repo
      path, but the C version just calls "mkdir". Let's replicate
      the old behavior. We have to create the git and worktree
      leading dirs separately; while most of the time, the
      worktree dir contains the git dir (as .git), the user can
      override this using GIT_WORK_TREE.
      
      We can reuse safe_create_leading_directories, but we need to
      make a copy of our const buffer to do so. Since
      merge-recursive uses the same pattern, we can factor this
      out into a global function. This has two other cleanup
      advantages for merge-recursive:
      
        1. mkdir_p wasn't a very good name. "mkdir -p foo/bar" actually
           creates bar, but this function just creates the leading
           directories.
      
        2. mkdir_p took a mode argument, but it was completely
           ignored.
      Acked-by: default avatarDaniel Barkalow <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      2beebd22
    • Nicolas Pitre's avatar
      optimize verify-pack a bit · 99093238
      Nicolas Pitre authored
      Using find_pack_entry_one() to get object offsets is rather suboptimal
      when nth_packed_object_offset() can be used directly.
      Signed-off-by: default avatarNicolas Pitre <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      99093238
    • Jeff King's avatar
      clone: create intermediate directories of destination repo · 8e21d63b
      Jeff King authored
      The shell version used to use "mkdir -p" to create the repo
      path, but the C version just calls "mkdir". Let's replicate
      the old behavior. We have to create the git and worktree
      leading dirs separately; while most of the time, the
      worktree dir contains the git dir (as .git), the user can
      override this using GIT_WORK_TREE.
      
      We can reuse safe_create_leading_directories, but we need to
      make a copy of our const buffer to do so. Since
      merge-recursive uses the same pattern, we can factor this
      out into a global function. This has two other cleanup
      advantages for merge-recursive:
      
        1. mkdir_p wasn't a very good name. "mkdir -p foo/bar" actually
           creates bar, but this function just creates the leading
           directories.
      
        2. mkdir_p took a mode argument, but it was completely
           ignored.
      Acked-by: default avatarDaniel Barkalow <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      8e21d63b
    • Nicolas Pitre's avatar
      refactor pack structure allocation · 27d69a46
      Nicolas Pitre authored
      New pack structures are currently allocated in 2 different places
      and all members have to be initialized explicitly.  This is prone
      to errors leading to segmentation faults as found by Teemu Likonen.
      
      Let's have a common place where this structure is allocated, and have
      all members explicitly initialized to zero.
      Signed-off-by: default avatarNicolas Pitre <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      27d69a46
  22. 24 Jun, 2008 1 commit
    • Nicolas Pitre's avatar
      implement some resilience against pack corruptions · 8eca0b47
      Nicolas Pitre authored
      We should be able to fall back to loose objects or alternative packs when
      a pack becomes corrupted.  This is especially true when an object exists
      in one pack only as a delta but its base object is corrupted.  Currently
      there is no way to retrieve the former object even if the later is
      available in another pack or loose.
      
      This patch allows for a delta to be resolved (with a performance cost)
      using a base object from a source other than the pack where that delta
      is located.  Same thing for non-delta objects: rather than failing
      outright, a search is made in other packs or used loose when the
      currently active pack has it but corrupted.
      
      Of course git will become extremely noisy with error messages when that
      happens.  However, if the operation succeeds nevertheless, a simple
      'git repack -a -f -d' will "fix" the corrupted repository given that all
      corrupted objects have a good duplicate somewhere in the object store,
      possibly manually copied from another source.
      Signed-off-by: default avatarNicolas Pitre <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      8eca0b47
  23. 23 Jun, 2008 2 commits
  24. 22 Jun, 2008 1 commit
  25. 18 Jun, 2008 1 commit
    • Linus Torvalds's avatar
      Add config option to enable 'fsync()' of object files · aafe9fba
      Linus Torvalds authored
      As explained in the documentation[*] this is totally useless on
      filesystems that do ordered/journalled data writes, but it can be a
      useful safety feature on filesystems like HFS+ that only journal the
      metadata, not the actual file contents.
      
      It defaults to off, although we could presumably in theory some day
      auto-enable it on a per-filesystem basis.
      
      [*] Yes, I updated the docs for the thing.  Hell really _has_ frozen
          over, and the four horsemen are probably just beyond the horizon.
          EVERYBODY PANIC!
      Signed-off-by: default avatarLinus Torvalds <[email protected]>
      Signed-off-by: default avatarJunio C Hamano <[email protected]>
      aafe9fba
  26. 17 Jun, 2008 2 commits