1. 02 Apr, 2018 2 commits
  2. 02 Feb, 2018 2 commits
  3. 27 Sep, 2017 1 commit
    • Jeff King's avatar
      avoid looking at errno for short read_in_full() returns · 90dca671
      Jeff King authored
      When a caller tries to read a particular set of bytes via
      read_in_full(), there are three possible outcomes:
      
        1. An error, in which case -1 is returned and errno is
           set.
      
        2. A short read, in which fewer bytes are returned and
           errno is unspecified (we never saw a read error, so we
           may have some random value from whatever syscall failed
           last).
      
        3. The full read completed successfully.
      
      Many callers handle cases 1 and 2 together by just checking
      the result against the requested size. If their combined
      error path looks at errno (e.g., by calling die_errno), they
      may report a nonsense value.
      
      Let's fix these sites by having them distinguish between the
      two error cases. That avoids the random errno confusion, and
      lets us give more detailed error messages.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      90dca671
  4. 08 May, 2017 1 commit
  5. 28 Mar, 2017 2 commits
    • Jeff King's avatar
      odb_mkstemp: write filename into strbuf · 594fa999
      Jeff King authored
      The odb_mkstemp() function expects the caller to provide a
      fixed buffer to write the resulting tempfile name into. But
      it creates the template using snprintf without checking the
      return value. This means we could silently truncate the
      filename.
      
      In practice, it's unlikely that the truncation would end in
      the template-pattern that mkstemp needs to open the file. So
      we'd probably end up failing either way, unless the path was
      specially crafted.
      
      The simplest fix would be to notice the truncation and die.
      However, we can observe that most callers immediately
      xstrdup() the result anyway. So instead, let's switch to
      using a strbuf, which is easier for them (and isn't a big
      deal for the other 2 callers, who can just strbuf_release
      when they're done with it).
      
      Note that many of the callers used static buffers, but this
      was purely to avoid putting a large buffer on the stack. We
      never passed the static buffers out of the function, so
      there's no complicated memory handling we need to change.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      594fa999
    • Jeff King's avatar
      do not check odb_mkstemp return value for errors · 892e723a
      Jeff King authored
      The odb_mkstemp function does not return an error; it dies
      on failure instead. But many of its callers compare the
      resulting descriptor against -1 and die themselves.
      
      Mostly this is just pointless, but it does raise a question
      when looking at the callers: if they show the results of the
      "template" buffer after a failure, what's in it? The answer
      is: it doesn't matter, because it cannot happen.
      
      So let's make that clear by removing the bogus error checks.
      In bitmap_writer_finish(), we can drop the error-handling
      code entirely. In the other two cases, it's shared with the
      open() in another code path; we can just move the
      error-check next to that open() call.
      
      And while we're at it, let's flesh out the function's
      docstring a bit to make the error behavior clear.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      892e723a
  6. 24 Mar, 2017 1 commit
    • Jeff King's avatar
      encode_in_pack_object_header: respect output buffer length · 7202a6fa
      Jeff King authored
      The encode_in_pack_object_header() writes a variable-length
      header to an output buffer, but it doesn't actually know
      long the buffer is. At first glance, this looks like it
      might be possible to overflow.
      
      In practice, this is probably impossible. The smallest
      buffer we use is 10 bytes, which would hold the header for
      an object up to 2^67 bytes. Obviously we're not likely to
      see such an object, but we might worry that an object could
      lie about its size (causing us to overflow before we realize
      it does not actually have that many bytes). But the argument
      is passed as a uintmax_t. Even on systems that have __int128
      available, uintmax_t is typically restricted to 64-bit by
      the ABI.
      
      So it's unlikely that a system exists where this could be
      exploited. Still, it's easy enough to use a normal out/len
      pair and make sure we don't write too far. That protects the
      hypothetical 128-bit system, makes it harder for callers to
      accidentally specify a too-small buffer, and makes the
      resulting code easier to audit.
      
      Note that the one caller in fast-import tried to catch such
      a case, but did so _after_ the call (at which point we'd
      have already overflowed!). This check can now go away.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      7202a6fa
  7. 29 Sep, 2016 1 commit
  8. 29 Jul, 2016 1 commit
    • Jeff King's avatar
      sha1_file: drop free_pack_by_name · 3157c880
      Jeff King authored
      The point of this function is to drop an entry from the
      "packed_git" cache that points to a file we might be
      overwriting, because our contents may not be the same (and
      hence the only caller was pack-objects as it moved a
      temporary packfile into place).
      
      In older versions of git, this could happen because the
      names of packfiles were derived from the set of objects they
      contained, not the actual bits on disk. But since 1190a1ac
      (pack-objects: name pack files after trailer hash,
      2013-12-05), the name reflects the actual bits on disk, and
      any two packfiles with the same name can be used
      interchangeably.
      
      Dropping this function not only saves a few lines of code,
      it makes the lifetime of "struct packed_git" much easier to
      reason about: namely, we now do not ever free these structs.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      3157c880
  9. 02 Sep, 2014 1 commit
  10. 03 Mar, 2014 1 commit
  11. 30 Dec, 2013 1 commit
    • Vicent Marti's avatar
      pack-objects: implement bitmap writing · 7cc8f971
      Vicent Marti authored
      This commit extends more the functionality of `pack-objects` by allowing
      it to write out a `.bitmap` index next to any written packs, together
      with the `.idx` index that currently gets written.
      
      If bitmap writing is enabled for a given repository (either by calling
      `pack-objects` with the `--write-bitmap-index` flag or by having
      `pack.writebitmaps` set to `true` in the config) and pack-objects is
      writing a packfile that would normally be indexed (i.e. not piping to
      stdout), we will attempt to write the corresponding bitmap index for the
      packfile.
      
      Bitmap index writing happens after the packfile and its index has been
      successfully written to disk (`finish_tmp_packfile`). The process is
      performed in several steps:
      
          1. `bitmap_writer_set_checksum`: this call stores the partial
             checksum for the packfile being written; the checksum will be
             written in the resulting bitmap index to verify its integrity
      
          2. `bitmap_writer_build_type_index`: this call uses the array of
             `struct object_entry` that has just been sorted when writing out
             the actual packfile index to disk to generate 4 type-index bitmaps
             (one for each object type).
      
             These bitmaps have their nth bit set if the given object is of
             the bitmap's type. E.g. the nth bit of the Commits bitmap will be
             1 if the nth object in the packfile index is a commit.
      
             This is a very cheap operation because the bitmap writing code has
             access to the metadata stored in the `struct object_entry` array,
             and hence the real type for each object in the packfile.
      
          3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
             index for one of the packfiles we're trying to repack, this call
             will efficiently rebuild the existing bitmaps so they can be
             reused on the new index. All the existing bitmaps will be stored
             in a `reuse` hash table, and the commit selection phase will
             prioritize these when selecting, as they can be written directly
             to the new index without having to perform a revision walk to
             fill the bitmap. This can greatly speed up the repack of a
             repository that already has bitmaps.
      
          4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
             a given `pack-objects` run, the sequence of commits generated
             during the Counting Objects phase will be stored in an array.
      
             We then use that array to build up the list of selected commits.
             Writing a bitmap in the index for each object in the repository
             would be cost-prohibitive, so we use a simple heuristic to pick
             the commits that will be indexed with bitmaps.
      
             The current heuristics are a simplified version of JGit's
             original implementation. We select a higher density of commits
             depending on their age: the 100 most recent commits are always
             selected, after that we pick 1 commit of each 100, and the gap
             increases as the commits grow older. On top of that, we make sure
             that every single branch that has not been merged (all the tips
             that would be required from a clone) gets their own bitmap, and
             when selecting commits between a gap, we tend to prioritize the
             commit with the most parents.
      
             Do note that there is no right/wrong way to perform commit
             selection; different selection algorithms will result in
             different commits being selected, but there's no such thing as
             "missing a commit". The bitmap walker algorithm implemented in
             `prepare_bitmap_walk` is able to adapt to missing bitmaps by
             performing manual walks that complete the bitmap: the ideal
             selection algorithm, however, would select the commits that are
             more likely to be used as roots for a walk in the future (e.g.
             the tips of each branch, and so on) to ensure a bitmap for them
             is always available.
      
          5. `bitmap_writer_build`: this is the computationally expensive part
             of bitmap generation. Based on the list of commits that were
             selected in the previous step, we perform several incremental
             walks to generate the bitmap for each commit.
      
             The walks begin from the oldest commit, and are built up
             incrementally for each branch. E.g. consider this dag where A, B,
             C, D, E, F are the selected commits, and a, b, c, e are a chunk
             of simplified history that will not receive bitmaps.
      
                  A---a---B--b--C--c--D
                           \
                            E--e--F
      
             We start by building the bitmap for A, using A as the root for a
             revision walk and marking all the objects that are reachable
             until the walk is over. Once this bitmap is stored, we reuse the
             bitmap walker to perform the walk for B, assuming that once we
             reach A again, the walk will be terminated because A has already
             been SEEN on the previous walk.
      
             This process is repeated for C, and D, but when we try to
             generate the bitmaps for E, we can reuse neither the current walk
             nor the bitmap we have generated so far.
      
             What we do now is resetting both the walk and clearing the
             bitmap, and performing the walk from scratch using E as the
             origin. This new walk, however, does not need to be completed.
             Once we hit B, we can lookup the bitmap we have already stored
             for that commit and OR it with the existing bitmap we've composed
             so far, allowing us to limit the walk early.
      
             After all the bitmaps have been generated, another iteration
             through the list of commits is performed to find the best XOR
             offsets for compression before writing them to disk. Because of
             the incremental nature of these bitmaps, XORing one of them with
             its predecesor results in a minimal "bitmap delta" most of the
             time. We can write this delta to the on-disk bitmap index, and
             then re-compose the original bitmaps by XORing them again when
             loaded.
      
             This is a phase very similar to pack-object's `find_delta` (using
             bitmaps instead of objects, of course), except the heuristics
             have been greatly simplified: we only check the 10 bitmaps before
             any given one to find best compressing one. This gives good
             results in practice, because there is locality in the ordering of
             the objects (and therefore bitmaps) in the packfile.
      
           6. `bitmap_writer_finish`: the last step in the process is
      	serializing to disk all the bitmap data that has been generated
      	in the two previous steps.
      
      	The bitmap is written to a tmp file and then moved atomically to
      	its final destination, using the same process as
      	`pack-write.c:write_idx_file`.
      Signed-off-by: default avatarVicent Marti <tanoku@gmail.com>
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      7cc8f971
  12. 26 Dec, 2013 1 commit
    • Jeff King's avatar
      do not pretend sha1write returns errors · 9af270e8
      Jeff King authored
      The sha1write function returns an int, but it will always be
      "0". The failure-prone parts of the function happen in the
      "flush" callback, which cannot pass an error back to us. So
      we just end up calling die() during the flush.
      
      Let's just drop the return value altogether, as it only
      confuses callers into thinking that it might be useful.
      
      Only one call site actually checked the return value. We can
      drop that check, since it just led to a die() anyway.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      9af270e8
  13. 05 Dec, 2013 1 commit
    • Jeff King's avatar
      pack-objects: name pack files after trailer hash · 1190a1ac
      Jeff King authored
      Our current scheme for naming packfiles is to calculate the
      sha1 hash of the sorted list of objects contained in the
      packfile. This gives us a unique name, so we are reasonably
      sure that two packs with the same name will contain the same
      objects.
      
      It does not, however, tell us that two such packs have the
      exact same bytes. This makes things awkward if we repack the
      same set of objects. Due to run-to-run variations, the bytes
      may not be identical (e.g., changed zlib or git versions,
      different source object reuse due to new packs in the
      repository, or even different deltas due to races during a
      multi-threaded delta search).
      
      In theory, this could be helpful to a program that cares
      that the packfile contains a certain set of objects, but
      does not care about the particular representation. In
      practice, no part of git makes use of that, and in many
      cases it is potentially harmful. For example, if a dumb http
      client fetches the .idx file, it must be sure to get the
      exact .pack that matches it. Similarly, a partial transfer
      of a .pack file cannot be safely resumed, as the actual
      bytes may have changed.  This could also affect a local
      client which opened the .idx and .pack files, closes the
      .pack file (due to memory or file descriptor limits), and
      then re-opens a changed packfile.
      
      In all of these cases, git can detect the problem, as we
      have the sha1 of the bytes themselves in the pack trailer
      (which we verify on transfer), and the .idx file references
      the trailer from the matching packfile. But it would be
      simpler and more efficient to actually get the correct
      bytes, rather than noticing the problem and having to
      restart the operation.
      
      This patch simply uses the pack trailer sha1 as the pack
      name. It should be similarly unique, but covers the exact
      representation of the objects. Other parts of git should not
      care, as the pack name is returned by pack-objects and is
      essentially opaque.
      
      One test needs to be updated, because it actually corrupts a
      pack and expects that re-packing the corrupted bytes will
      use the same name. It won't anymore, but we can easily just
      use the name that pack-objects hands back.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      1190a1ac
  14. 21 Dec, 2011 1 commit
    • Ævar Arnfjörð Bjarmason's avatar
      Appease Sun Studio by renaming "tmpfile" · ab1900a3
      Ævar Arnfjörð Bjarmason authored
      On Solaris the system headers define the "tmpfile" name, which'll
      cause Git compiled with Sun Studio 12 Update 1 to whine about us
      redefining the name:
      
          "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile     (E_PRAGMA_REDEFINE_STATIC)
          "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile    (E_PRAGMA_REDEFINE_STATIC)
          "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile   (E_PRAGMA_REDEFINE_STATIC)
          "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile    (E_PRAGMA_REDEFINE_STATIC)
      
      Just renaming the "tmpfile" variable to "tmp_file" in the relevant
      places is the easiest way to fix this.
      Signed-off-by: Ævar Arnfjörð Bjarmason's avatarÆvar Arnfjörð Bjarmason <avarab@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      ab1900a3
  15. 17 Nov, 2011 1 commit
  16. 28 Oct, 2011 3 commits
  17. 28 Feb, 2011 4 commits
    • Junio C Hamano's avatar
      index-pack --verify: read anomalous offsets from v2 idx file · 3c9fc074
      Junio C Hamano authored
      A pack v2 .idx file usually records offset using 64-bit representation
      only when the offset does not fit within 31-bit, but you can handcraft
      your .idx file to record smaller offset using 64-bit, storing all zero
      in the upper 4-byte.  By inspecting the original idx file when running
      index-pack --verify, encode such low offsets that do not need to be in
      64-bit but are encoded using 64-bit just like the original idx file so
      that we can still validate the pack/idx pair by comparing the idx file
      recomputed with the original.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      3c9fc074
    • Junio C Hamano's avatar
      fb956c1f
    • Junio C Hamano's avatar
      index-pack: --verify · e337a04d
      Junio C Hamano authored
      Given an existing .pack file and the .idx file that describes it,
      this new mode of operation reads and re-index the packfile and makes
      sure the existing .idx file matches the result byte-for-byte.
      
      All the objects in the .pack file are validated during this operation as
      well.  Unlike verify-pack, which visits each object described in the .idx
      file in the SHA-1 order, index-pack efficiently exploits the delta-chain
      to avoid rebuilding the objects that are used as the base of deltified
      objects over and over again while validating the objects, resulting in
      much quicker verification of the .pack file and its .idx file.
      
      This version however cannot verify a .pack/.idx pair with a handcrafted v2
      index that uses 64-bit offset representation for offsets that would fit
      within 31-bit. You can create such an .idx file by giving a custom offset
      to --index-version option to the command.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      e337a04d
    • Junio C Hamano's avatar
      write_idx_file: introduce a struct to hold idx customization options · ebcfb379
      Junio C Hamano authored
      Remove two globals, pack_idx_default version and pack_idx_off32_limit,
      and place them in a pack_idx_option structure.  Allow callers to pass
      it to write_idx_file() as a parameter.
      
      Adjust all callers to the API change.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      ebcfb379
  18. 23 Feb, 2010 1 commit
    • Micolas Pitre's avatar
      move encode_in_pack_object_header() to a better place · f965c525
      Micolas Pitre authored
      Commit 1b22b6c8 made duplicated versions of encode_header() into a
      common version called encode_in_pack_object_header(). There is however
      a better location that sha1_file.c for such a function though, as
      sha1_file.c contains nothing related to the creation of packs, and
      it is quite populated already.
      
      Also the comment that was moved to the header file should really remain
      near the function as it covers implementation details and provides no
      information about the actual function interface.
      Signed-off-by: Micolas Pitre's avatarNicolas Pitre <nico@fluxnic.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      f965c525
  19. 22 Jan, 2010 1 commit
  20. 27 Jun, 2009 1 commit
  21. 25 Feb, 2009 1 commit
    • Junio C Hamano's avatar
      Make sure objects/pack exists before creating a new pack · 6e180cdc
      Junio C Hamano authored
      In a repository created with git older than f49fb35d (git-init-db: create
      "pack" subdirectory under objects, 2005-06-27), objects/pack/ directory is
      not created upon initialization.  It was Ok because subdirectories are
      created as needed inside directories init-db creates, and back then,
      packfiles were recent invention.
      
      After the said commit, new codepaths started relying on the presense of
      objects/pack/ directory in the repository.  This was exacerbated with
      8b4eb6b6 (Do not perform cross-directory renames when creating packs,
      2008-09-22) that moved the location temporary pack files are created from
      objects/ directory to objects/pack/ directory, because moving temporary to
      the final location was done carefully with lazy leading directory creation.
      
      Many packfile related operations in such an old repository can fail
      mysteriously because of this.
      
      This commit introduces two helper functions to make things work better.
      
       - odb_mkstemp() is a specialized version of mkstemp() to refactor the
         code and teach it to create leading directories as needed;
      
       - odb_pack_keep() refactors the code to create a ".keep" file while
         create leading directories as needed.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      6e180cdc
  22. 03 Oct, 2008 1 commit
    • Nicolas Pitre's avatar
      fix openssl headers conflicting with custom SHA1 implementations · 9126f009
      Nicolas Pitre authored
      On ARM I have the following compilation errors:
      
          CC fast-import.o
      In file included from cache.h:8,
                       from builtin.h:6,
                       from fast-import.c:142:
      arm/sha1.h:14: error: conflicting types for 'SHA_CTX'
      /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here
      arm/sha1.h:16: error: conflicting types for 'SHA1_Init'
      /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here
      arm/sha1.h:17: error: conflicting types for 'SHA1_Update'
      /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here
      arm/sha1.h:18: error: conflicting types for 'SHA1_Final'
      /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here
      make: *** [fast-import.o] Error 1
      
      This is because openssl header files are always included in
      git-compat-util.h since commit 684ec6c6 whenever NO_OPENSSL is not
      set, which somehow brings in <openssl/sha1.h> clashing with the custom
      ARM version.  Compilation of git is probably broken on PPC too for the
      same reason.
      
      Turns out that the only file requiring openssl/ssl.h and openssl/err.h
      is imap-send.c.  But only moving those problematic includes there
      doesn't solve the issue as it also includes cache.h which brings in the
      conflicting local SHA1 header file.
      
      As suggested by Jeff King, the best solution is to rename our references
      to SHA1 functions and structure to something git specific, and define those
      according to the implementation used.
      Signed-off-by: default avatarNicolas Pitre <nico@cam.org>
      Signed-off-by: default avatarShawn O. Pearce <spearce@spearce.org>
      9126f009
  23. 22 Sep, 2008 1 commit
  24. 30 Aug, 2008 2 commits
  25. 27 Aug, 2008 1 commit
  26. 26 Jun, 2008 1 commit
  27. 31 May, 2008 1 commit
  28. 04 May, 2008 1 commit
  29. 17 Oct, 2007 1 commit
    • Nicolas Pitre's avatar
      fix const issues with some functions · 4049b9cf
      Nicolas Pitre authored
      Two functions, namely write_idx_file() and open_pack_file(), currently
      return a const pointer.  However that pointer is either a copy of the
      first argument, or set to a malloc'd buffer when that first argument
      is null.  In the later case it is wrong to qualify that pointer as const
      since ownership of the buffer is transferred to the caller to dispose of,
      and obviously the free() function is not meant to be passed const
      pointers.
      
      Making the return pointer not const causes a warning when the first
      argument is returned since that argument is also marked const.
      
      The correct thing to do is therefore to remove the const qualifiers,
      avoiding the need for ugly casts only to silence some warnings.
      Signed-off-by: default avatarNicolas Pitre <nico@cam.org>
      Signed-off-by: default avatarShawn O. Pearce <spearce@spearce.org>
      4049b9cf
  30. 19 Sep, 2007 1 commit
    • Shawn O. Pearce's avatar
      Refactor index-pack "keep $sha1" handling for reuse · 106764e6
      Shawn O. Pearce authored
      There is a subtle (but important) linkage between receive-pack and
      index-pack that allows index-pack to create a packfile but protect
      it from being deleted by a concurrent `git repack -a -d` operation.
      The linkage works by having index-pack mark the newly created pack
      with a ".keep" file and then it passes the SHA-1 name of that new
      packfile to receive-pack along its stdout channel.
      
      The receive-pack process must unkeep the packfile by deleting the
      .keep file, but can it can only do so after all elgible refs have
      been updated in the receiving repository.  This ensures that the
      packfile is either kept or its objects are reachable, preventing
      a concurrent repacker from deleting the packfile before it can
      determine that its objects are actually needed by the repository.
      
      The new builtin-fetch code needs to perform the same actions if
      it choose to run index-pack rather than unpack-objects, so I am
      moving this code out to its own function where both receive-pack
      and fetch-pack are able to invoke it when necessary.  The caller
      is responsible for deleting the returned ".keep" and freeing the
      path if the returned path is not NULL.
      Signed-off-by: default avatarShawn O. Pearce <spearce@spearce.org>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      106764e6
  31. 15 Aug, 2007 1 commit