1. 19 Mar, 2015 1 commit
    • Jeff King's avatar
      sha1fd_check: die when we cannot open the file · 599d2231
      Jeff King authored
      Right now we return a NULL "struct sha1file" if we encounter
      an error. However, the sole caller (write_idx_file) does not
      check the return value, and will segfault if we hit this
      case.
      
      One option would be to handle the error in the caller.
      However, there's really nothing for it to do but die. This
      code path is hit during "git index-pack --verify"; after we
      verify the packfile, we check that the ".idx" we would
      generate from it is byte-wise identical to what is on disk.
      We hit the error (and segfault) if we can't open the .idx
      file (a likely cause of this is that somebody else ran "git
      repack -ad" while we were verifying). Since we can't
      complete the requested verification, we really have no
      choice but to die.
      
      Furthermore, the rest of the sha1fd_* functions simply die
      on errors. So if were to open the file successfully, for
      example, and then hit a read error, sha1write would call
      die() for us. So pushing the die() down into sha1fd_check
      keeps the interface consistent.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      599d2231
  2. 26 Dec, 2013 1 commit
    • Jeff King's avatar
      do not pretend sha1write returns errors · 9af270e8
      Jeff King authored
      The sha1write function returns an int, but it will always be
      "0". The failure-prone parts of the function happen in the
      "flush" callback, which cannot pass an error back to us. So
      we just end up calling die() during the flush.
      
      Let's just drop the return value altogether, as it only
      confuses callers into thinking that it might be useful.
      
      Only one call site actually checked the return value. We can
      drop that check, since it just led to a die() anyway.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      9af270e8
  3. 24 Oct, 2013 1 commit
  4. 30 Nov, 2011 1 commit
    • Junio C Hamano's avatar
      csum-file: introduce sha1file_checkpoint · 6c526148
      Junio C Hamano authored
      It is useful to be able to rewind a check-summed file to a certain
      previous state after writing data into it using sha1write() API. The
      fast-import command does this after streaming a blob data to the packfile
      being generated and then noticing that the same blob has already been
      written, and it does this with a private code truncate_pack() that is
      commented as "Yes, this is a layering violation".
      
      Introduce two API functions, sha1file_checkpoint(), that allows the caller
      to save a state of a sha1file, and then later revert it to the saved state.
      Use it to reimplement truncate_pack().
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      6c526148
  5. 03 Apr, 2011 1 commit
  6. 28 Feb, 2011 1 commit
    • Junio C Hamano's avatar
      index-pack: --verify · e337a04d
      Junio C Hamano authored
      Given an existing .pack file and the .idx file that describes it,
      this new mode of operation reads and re-index the packfile and makes
      sure the existing .idx file matches the result byte-for-byte.
      
      All the objects in the .pack file are validated during this operation as
      well.  Unlike verify-pack, which visits each object described in the .idx
      file in the SHA-1 order, index-pack efficiently exploits the delta-chain
      to avoid rebuilding the objects that are used as the base of deltified
      objects over and over again while validating the objects, resulting in
      much quicker verification of the .pack file and its .idx file.
      
      This version however cannot verify a .pack/.idx pair with a handcrafted v2
      index that uses 64-bit offset representation for offsets that would fit
      within 31-bit. You can create such an .idx file by giving a custom offset
      to --index-version option to the command.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      e337a04d
  7. 27 Jun, 2009 1 commit
  8. 10 Oct, 2008 1 commit
    • Nicolas Pitre's avatar
      fix pread()'s short read in index-pack · 838cd346
      Nicolas Pitre authored
      Since v1.6.0.2~13^2~ the completion of a thin pack uses sha1write() for
      its ability to compute a SHA1 on the written data.  This also provides
      data buffering which, along with commit 92392b4a, will confuse pread()
      whenever an appended object is 1) freed due to memory pressure because
      of the depth-first delta processing, and 2) needed again because it has
      many delta children, and 3) its data is still buffered by sha1write().
      
      Let's fix the issue by simply forcing cached data out when such an
      object is written so it can be pread()'d at leisure.
      Signed-off-by: default avatarNicolas Pitre <nico@cam.org>
      Signed-off-by: default avatarShawn O. Pearce <spearce@spearce.org>
      838cd346
  9. 03 Oct, 2008 1 commit
    • Nicolas Pitre's avatar
      fix openssl headers conflicting with custom SHA1 implementations · 9126f009
      Nicolas Pitre authored
      On ARM I have the following compilation errors:
      
          CC fast-import.o
      In file included from cache.h:8,
                       from builtin.h:6,
                       from fast-import.c:142:
      arm/sha1.h:14: error: conflicting types for 'SHA_CTX'
      /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here
      arm/sha1.h:16: error: conflicting types for 'SHA1_Init'
      /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here
      arm/sha1.h:17: error: conflicting types for 'SHA1_Update'
      /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here
      arm/sha1.h:18: error: conflicting types for 'SHA1_Final'
      /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here
      make: *** [fast-import.o] Error 1
      
      This is because openssl header files are always included in
      git-compat-util.h since commit 684ec6c6 whenever NO_OPENSSL is not
      set, which somehow brings in <openssl/sha1.h> clashing with the custom
      ARM version.  Compilation of git is probably broken on PPC too for the
      same reason.
      
      Turns out that the only file requiring openssl/ssl.h and openssl/err.h
      is imap-send.c.  But only moving those problematic includes there
      doesn't solve the issue as it also includes cache.h which brings in the
      conflicting local SHA1 header file.
      
      As suggested by Jeff King, the best solution is to rename our references
      to SHA1 functions and structure to something git specific, and define those
      according to the implementation used.
      Signed-off-by: default avatarNicolas Pitre <nico@cam.org>
      Signed-off-by: default avatarShawn O. Pearce <spearce@spearce.org>
      9126f009
  10. 03 Sep, 2008 1 commit
    • Nicolas Pitre's avatar
      sha1write: don't copy full sized buffers · a8032d12
      Nicolas Pitre authored
      No need to memcpy() source buffer data when we might just process the
      data in place instead of accumulating it into a separate buffer.
      This is the case when a whole buffer would have been copied, summed,
      written out and then discarded right away.
      
      Also move the CRC32 processing within the loop so the data is more likely
      to remain in the L1 CPU cache between the CRC32 sum, SHA1 sum and the
      write call.
      Signed-off-by: default avatarNicolas Pitre <nico@cam.org>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      a8032d12
  11. 30 Aug, 2008 1 commit
  12. 31 May, 2008 1 commit
  13. 05 Nov, 2007 2 commits
    • Nicolas Pitre's avatar
      remove dead code from the csum-file interface · ec640ed1
      Nicolas Pitre authored
      The provided name argument is always constant and valid in every
      caller's context, so no need to have an array of PATH_MAX chars to copy
      it into when a simple pointer will do.  Unfortunately that means getting
      rid of wascally wabbits too.
      
      The 'error' field is also unused.
      Signed-off-by: default avatarNicolas Pitre <nico@cam.org>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      ec640ed1
    • Nicolas Pitre's avatar
      make display of total transferred more accurate · 218558af
      Nicolas Pitre authored
      The throughput display needs a delay period before accounting and
      displaying anything.  Yet it might be called after some amount of data
      has already been transferred.  The display of total data is therefore
      accounted late and therefore smaller than the reality.
      
      Let's call display_throughput() with an absolute amount of transferred
      data instead of a relative number, and let the throughput code find the
      relative amount of data by itself as needed.  This way the displayed
      total is always exact.
      Signed-off-by: default avatarNicolas Pitre <nico@cam.org>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      218558af
  14. 30 Oct, 2007 1 commit
  15. 17 Oct, 2007 1 commit
  16. 13 Jun, 2007 1 commit
  17. 21 May, 2007 1 commit
  18. 10 May, 2007 1 commit
    • Dana How's avatar
      Custom compression levels for objects and packs · 960ccca6
      Dana How authored
      Add config variables pack.compression and core.loosecompression ,
      and switch --compression=level to pack-objects.
      
      Loose objects will be compressed using core.loosecompression if set,
      else core.compression if set, else Z_BEST_SPEED.
      Packed objects will be compressed using --compression=level if seen,
      else pack.compression if set, else core.compression if set,
      else Z_DEFAULT_COMPRESSION.  This is the "pack compression level".
      
      Loose objects added to a pack undeltified will be recompressed
      to the pack compression level if it is unequal to the current
      loose compression level by the preceding rules,  or if the loose
      object was written while core.legacyheaders = true.  Newly
      deltified loose objects are always compressed to the current
      pack compression level.
      
      Previously packed objects added to a pack are recompressed
      to the current pack compression level exactly when their
      deltification status changes,  since the previous pack data
      cannot be reused.
      
      In either case,  the --no-reuse-object switch from the first
      patch below will always force recompression to the current pack
      compression level,  instead of assuming the pack compression level
      hasn't changed and pack data can be reused when possible.
      
      This applies on top of the following patches from Nicolas Pitre:
      [PATCH] allow for undeltified objects not to be reused
      [PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
      Signed-off-by: default avatarDana L. How <danahow@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      960ccca6
  19. 10 Apr, 2007 1 commit
    • Nicolas Pitre's avatar
      compute a CRC32 for each object as stored in a pack · 78d1e84f
      Nicolas Pitre authored
      The most important optimization for performance when repacking is the
      ability to reuse data from a previous pack as is and bypass any delta
      or even SHA1 computation by simply copying the raw data from one pack
      to another directly.
      
      The problem with  this is that any data corruption within a copied object
      would go unnoticed and the new (repacked) pack would be self-consistent
      with its own checksum despite containing a corrupted object.  This is a
      real issue that already happened at least once in the past.
      
      In some attempt to prevent this, we validate the copied data by inflating
      it and making sure no error is signaled by zlib.  But this is still not
      perfect as a significant portion of a pack content is made of object
      headers and references to delta base objects which are not deflated and
      therefore not validated when repacking actually making the pack data reuse
      still not as safe as it could be.
      
      Of course a full SHA1 validation could be performed, but that implies
      full data inflating and delta replaying which is extremely costly, which
      cost the data reuse optimization was designed to avoid in the first place.
      
      So the best solution to this is simply to store a CRC32 of the raw pack
      data for each object in the pack index.  This way any object in a pack can
      be validated before being copied as is in another pack, including header
      and any other non deflated data.
      
      Why CRC32 instead of a faster checksum like Adler32?  Quoting Wikipedia:
      
         Jonathan Stone discovered in 2001 that Adler-32 has a weakness for very
         short messages. He wrote "Briefly, the problem is that, for very short
         packets, Adler32 is guaranteed to give poor coverage of the available
         bits. Don't take my word for it, ask Mark Adler. :-)" The problem is
         that sum A does not wrap for short messages. The maximum value of A for
         a 128-byte message is 32640, which is below the value 65521 used by the
         modulo operation. An extended explanation can be found in RFC 3309,
         which mandates the use of CRC32 instead of Adler-32 for SCTP, the
         Stream Control Transmission Protocol.
      
      In the context of a GIT pack, we have lots of small objects, especially
      deltas, which are likely to be quite small and in a size range for which
      Adler32 is dimed not to be sufficient.  Another advantage of CRC32 is the
      possibility for recovery from certain types of small corruptions like
      single bit errors which are the most probable type of corruptions.
      
      OK what this patch does is to compute the CRC32 of each object written to
      a pack within pack-objects.  It is not written to the index yet and it is
      obviously not validated when reusing pack data yet either.
      Signed-off-by: default avatarNicolas Pitre <nico@cam.org>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      78d1e84f
  20. 23 Aug, 2006 1 commit
    • Shawn Pearce's avatar
      Convert memcpy(a,b,20) to hashcpy(a,b). · e702496e
      Shawn Pearce authored
      This abstracts away the size of the hash values when copying them
      from memory location to memory location, much as the introduction
      of hashcmp abstracted away hash value comparsion.
      
      A few call sites were using char* rather than unsigned char* so
      I added the cast rather than open hashcpy to be void*.  This is a
      reasonable tradeoff as most call sites already use unsigned char*
      and the existing hashcmp is also declared to be unsigned char*.
      
      [jc: Splitted the patch to "master" part, to be followed by a
       patch for merge-recursive.c which is not in "master" yet.
      
       Fixed the cast in the latter hunk to combine-diff.c which was
       wrong in the original.
      
       Also converted ones left-over in combine-diff.c, diff-lib.c and
       upload-pack.c ]
      Signed-off-by: default avatarShawn O. Pearce <spearce@spearce.org>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      e702496e
  21. 15 Aug, 2006 1 commit
  22. 03 Jul, 2006 1 commit
    • Joachim B Haga's avatar
      Make zlib compression level configurable, and change default. · 12f6c308
      Joachim B Haga authored
      With the change in default, "git add ." on kernel dir is about
      twice as fast as before, with only minimal (0.5%) change in
      object size. The speed difference is even more noticeable
      when committing large files, which is now up to 8 times faster.
      
      The configurability is through setting core.compression = [-1..9]
      which maps to the zlib constants; -1 is the default, 0 is no
      compression, and 1..9 are various speed/size tradeoffs, 9
      being slowest.
      
      Signed-off-by: Joachim B Haga (cjhaga@fys.uio.no)
      Acked-by: default avatarLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      12f6c308
  23. 20 Jun, 2006 1 commit
  24. 20 Dec, 2005 1 commit
  25. 09 Aug, 2005 1 commit
  26. 06 Jul, 2005 1 commit
  27. 28 Jun, 2005 2 commits
  28. 27 Jun, 2005 2 commits
    • Linus Torvalds's avatar
      csum-file interface updates: return resulting SHA1 · e1808845
      Linus Torvalds authored
      Also, make the writing of the SHA1 as a end-header be conditional: not
      every user will necessarily want to write the SHA1 to the file itself,
      even though current users do (but we migh end up using the same helper
      functions for the object files themselves, that don't do this).
      
      This also makes the packed index file contain the SHA1 of the packed
      data file at the end (just before its own SHA1).  That way you can
      validate the pairing of the two if you want to.
      e1808845
    • Linus Torvalds's avatar
      git-pack-objects: write the pack files with a SHA1 csum · c38138cd
      Linus Torvalds authored
      We want to be able to check their integrity later, and putting the
      sha1-sum of the contents at the end is a good thing.  The writing
      routines are generic, so we could try to re-use them for the index file,
      instead of having the same logic duplicated.
      
      Update unpack-objects to know about the extra 20 bytes at the end
      of the index.
      c38138cd