1. 15 May, 2006 1 commit
    • Junio C Hamano's avatar
      Fix pack-index issue on 64-bit platforms a bit more portably. · 1b9bc5a7
      Junio C Hamano authored
      Apparently <stdint.h> is not enough for uint32_t on OpenBSD; use
      "unsigned int" -- hopefully that would stay 32-bit on every
      platform we care about, at least until we update the pack-index
      file format.
      
      Our sha1 routines optimized for architectures use uint32_t and
      expects '#include <stdint.h>' to be enough, so OpenBSD on arm or
      ppc might have similar issues down the road, I dunno.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      1b9bc5a7
  2. 14 May, 2006 1 commit
  3. 13 May, 2006 1 commit
    • Dennis Stosberg's avatar
      Fix git-pack-objects for 64-bit platforms · 66561f5a
      Dennis Stosberg authored
      The offset of an object in the pack is recorded as a 4-byte integer
      in the index file.  When reading the offset from the mmap'ed index
      in prepare_pack_revindex(), the address is dereferenced as a long*.
      This works fine as long as the long type is four bytes wide.  On
      NetBSD/sparc64, however, a long is 8 bytes wide and so dereferencing
      the offset produces garbage.
      
      [jc: taking suggestion by Linus to use uint32_t]
      Signed-off-by: default avatarDennis Stosberg <dennis@stosberg.net>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      66561f5a
  4. 21 Apr, 2006 2 commits
  5. 07 Apr, 2006 1 commit
    • Junio C Hamano's avatar
      Thin pack generation: optimization. · 5379a5c5
      Junio C Hamano authored
      Jens Axboe noticed that recent "git push" has become very slow
      since we made --thin transfer the default.
      
      Thin pack generation to push a handful revisions that touch
      relatively small number of paths out of huge tree was stupid; it
      registered _everything_ from the excluded revisions.  As a
      result, "Counting objects" phase was unnecessarily expensive.
      
      This changes the logic to register the blobs and trees from
      excluded revisions only for paths we are actually going to send
      to the other end.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      5379a5c5
  6. 04 Apr, 2006 2 commits
  7. 02 Apr, 2006 2 commits
  8. 30 Mar, 2006 1 commit
    • Junio C Hamano's avatar
      tree/diff header cleanup. · 1b0c7174
      Junio C Hamano authored
      Introduce tree-walk.[ch] and move "struct tree_desc" and
      associated functions from various places.
      
      Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and
      move it to cache.h.  This macro returns the canonicalized
      st_mode value in the host byte order for files, symlinks and
      directories -- to be compared with a tree_desc entry.
      create_ce_mode(mode) in cache.h is similar but is intended to be
      used for index entries (so it does not work for directories) and
      returns the value in the network byte order.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      1b0c7174
  9. 06 Mar, 2006 1 commit
    • Junio C Hamano's avatar
      pack-objects: simplify "thin" pack. · 70ca1a3f
      Junio C Hamano authored
      There was a misguided logic to overly prefer using objects that
      we are not going to pack as the base object.  This was
      unnecessary.  It does not matter to the unpacking side where the
      base object is -- it matters more to make the resulting delta
      smaller.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      70ca1a3f
  10. 02 Mar, 2006 2 commits
    • Nicolas Pitre's avatar
      diff-delta: allow reusing of the reference buffer index · 38fd0721
      Nicolas Pitre authored
      When a reference buffer is used multiple times then its index can be
      computed only once and reused multiple times.  This patch adds an extra
      pointer to a pointer argument (from_index) to diff_delta() for this.
      
      If from_index is NULL then everything is like before.
      
      If from_index is non NULL and *from_index is NULL then the index is
      created and its location stored to *from_index.  In this case the caller
      has the responsibility to free the memory pointed to by *from_index.
      
      If from_index and *from_index are non NULL then the index is reused as
      is.
      
      This currently saves about 10% of CPU time to repack the git archive.
      Signed-off-by: default avatarNicolas Pitre <nico@cam.org>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      38fd0721
    • Luck, Tony's avatar
      Re-fix compilation warnings. · 2b74cffa
      Luck, Tony authored
      Commit 8fcf1ad9 has a
      combination of double cast and Andreas' switch to using
      unsigned long ... just the latter is sufficient (and a lot less
      ugly than using the double cast).
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      2b74cffa
  11. 26 Feb, 2006 1 commit
  12. 25 Feb, 2006 1 commit
    • Luck, Tony's avatar
      fix warning from pack-objects.c · 8fcf1ad9
      Luck, Tony authored
      When compiling on ia64 I get this warning (from gcc 3.4.3):
      
      gcc -o pack-objects.o -c -g -O2 -Wall -DSHA1_HEADER='<openssl/sha.h>'  pack-objects.c
      pack-objects.c: In function `pack_revindex_ix':
      pack-objects.c:94: warning: cast from pointer to integer of different size
      
      A double cast (first to long, then to int) shuts gcc up, but is there
      a better way?
      
      [jc: Andreas Ericsson suggests to use ulong instead. ]
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      8fcf1ad9
  13. 24 Feb, 2006 2 commits
    • Junio C Hamano's avatar
      pack-objects: hash basename and direname a bit differently. · eeef7135
      Junio C Hamano authored
      ...so that "Makefile"s from different revs are sorted together,
      separate from "t/Makefile"s, but close enough.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      eeef7135
    • Junio C Hamano's avatar
      pack-objects: allow "thin" packs to exceed depth limits · b76f6b62
      Junio C Hamano authored
      When creating a new pack to be used in .git/objects/pack/
      directory, we carefully count the depth of deltified objects to
      be reused, so that the generated pack does not to exceed the
      specified depth limit for runtime efficiency.  However, when we
      are generating a thin pack that does not contain base objects,
      such a pack can only be used during network transfer that is
      expanded on the other end upon reception, so being careful and
      artificially cutting the delta chain does not buy us anything
      except increased bandwidth requirement.  This patch disables the
      delta chain depth limit check when reusing an existing delta.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      b76f6b62
  14. 23 Feb, 2006 3 commits
  15. 22 Feb, 2006 6 commits
    • Nicolas Pitre's avatar
      also adds progress when actually writing a pack · 5e8dc750
      Nicolas Pitre authored
      If that pack is big, it takes significant time to write and might
      benefit from some more eye candies as well.  This is however disabled
      when the pack is written to stdout since in that case the output is
      usually piped into unpack_objects which already does its own progress
      reporting.
      Signed-off-by: default avatarNicolas Pitre <nico@cam.org>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      5e8dc750
    • Nicolas Pitre's avatar
      nicer eye candies for pack-objects · b2504a0d
      Nicolas Pitre authored
      This provides a stable and simpler progress reporting mechanism that
      updates progress as often as possible but accurately not updating more
      than once a second.  The deltification phase is also made more
      interesting to watch (since repacking a big repository and only seeing a
      dot appear once every many seconds is rather boring and doesn't provide
      much food for anticipation).
      Signed-off-by: default avatarNicolas Pitre <nico@cam.org>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      b2504a0d
    • Junio C Hamano's avatar
      pack-objects: avoid delta chains that are too long. · 15b4d577
      Junio C Hamano authored
      This tries to rework the solution for the excess delta chain
      problem. An earlier commit worked it around ``cheaply'', but
      repeated repacking risks unbound growth of delta chains.
      
      This version counts the length of delta chain we are reusing
      from the existing pack, and makes sure a base object that has
      sufficiently long delta chain does not get deltified.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      15b4d577
    • Junio C Hamano's avatar
      pack-objects: finishing touches. · ab7cd7bb
      Junio C Hamano authored
      This introduces --no-reuse-delta option to disable reusing of
      existing delta, which is a large part of the optimization
      introduced by this series.  This may become necessary if
      repeated repacking makes delta chain too long.  With this, the
      output of the command becomes identical to that of the older
      implementation.  But the performance suffers greatly.
      
      It still allows reusing non-deltified representations; there is
      no point uncompressing and recompressing the whole text.
      
      It also adds a couple more statistics output, while squelching
      it under -q flag, which the last round forgot to do.
      
        $ time old-git-pack-objects --stdout >/dev/null <RL
        Generating pack...
        Done counting 184141 objects.
        Packing 184141 objects....................
        real    12m8.530s       user    11m1.450s       sys     0m57.920s
        $ time git-pack-objects --stdout >/dev/null <RL
        Generating pack...
        Done counting 184141 objects.
        Packing 184141 objects.....................
        Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
        real    0m59.549s       user    0m56.670s       sys     0m2.400s
        $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
        Generating pack...
        Done counting 184141 objects.
        Packing 184141 objects.....................
        Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
        real    11m13.830s      user    9m45.240s       sys     0m44.330s
      
      There is one remaining issue when --no-reuse-delta option is not
      used.  It can create delta chains that are deeper than specified.
      
          A<--B<--C<--D   E   F   G
      
      Suppose we have a delta chain A to D (A is stored in full either
      in a pack or as a loose object. B is depth1 delta relative to A,
      C is depth2 delta relative to B...) with loose objects E, F, G.
      And we are going to pack all of them.
      
      B, C and D are left as delta against A, B and C respectively.
      So A, E, F, and G are examined for deltification, and let's say
      we decided to keep E expanded, and store the rest as deltas like
      this:
      
          E<--F<--G<--A
      
      Oops.  We ended up making D a bit too deep, didn't we?  B, C and
      D form a chain on top of A!
      
      This is because we did not know what the final depth of A would
      be, when we checked objects and decided to keep the existing
      delta.  Unfortunately, deferring the decision until just before
      the deltification is not an option.  To be able to make B, C,
      and D candidates for deltification with the rest, we need to
      know the type and final unexpanded size of them, but the major
      part of the optimization comes from the fact that we do not read
      the delta data to do so -- getting the final size is quite an
      expensive operation.
      
      To prevent this from happening, we should keep A from being
      deltified.  But how would we tell that, cheaply?
      
      To do this most precisely, after check_object() runs, each
      object that is used as the base object of some existing delta
      needs to be marked with the maximum depth of the objects we
      decided to keep deltified (in this case, D is depth 3 relative
      to A, so if no other delta chain that is longer than 3 based on
      A exists, mark A with 3).  Then when attempting to deltify A, we
      would take that number into account to see if the final delta
      chain that leads to D becomes too deep.
      
      However, this is a bit cumbersome to compute, so we would cheat
      and reduce the maximum depth for A arbitrarily to depth/4 in
      this implementation.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      ab7cd7bb
    • Junio C Hamano's avatar
      pack-objects: reuse data from existing packs. · 3f9ac8d2
      Junio C Hamano authored
      When generating a new pack, notice if we have already needed
      objects in existing packs.  If an object is stored deltified,
      and its base object is also what we are going to pack, then
      reuse the existing deltified representation unconditionally,
      bypassing all the expensive find_deltas() and try_deltas()
      calls.
      
      Also, notice if what we are going to write out exactly match
      what is already in an existing pack (either deltified or just
      compressed).  In such a case, we can just copy it instead of
      going through the usual uncompressing & recompressing cycle.
      
      Without this patch, in linux-2.6 repository with about 1500
      loose objects and a single mega pack:
      
          $ git-rev-list --objects v2.6.16-rc3 >RL
          $ wc -l RL
          184141 RL
          $ time git-pack-objects p <RL
          Generating pack...
          Done counting 184141 objects.
          Packing 184141 objects....................
          a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
      
          real    12m4.323s
          user    11m2.560s
          sys     0m55.950s
      
      With this patch, the same input:
      
          $ time ../git.junio/git-pack-objects q <RL
          Generating pack...
          Done counting 184141 objects.
          Packing 184141 objects.....................
          a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
          Total 184141, written 184141, reused 182441
      
          real    1m2.608s
          user    0m55.090s
          sys     0m1.830s
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      3f9ac8d2
    • Nicolas Pitre's avatar
      relax delta selection filtering in pack-objects · cac251d0
      Nicolas Pitre authored
      This change provides a 8% saving on the pack size with a 4% CPU time
      increase for git-repack -a on the current git archive.
      Signed-off-by: default avatarNicolas Pitre <nico@cam.org>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      cac251d0
  16. 20 Feb, 2006 1 commit
    • Junio C Hamano's avatar
      Thin pack - create packfile with missing delta base. · 7a979d99
      Junio C Hamano authored
      This goes together with "rev-list --object-edge" change, to feed
      pack-objects list of edge commits in addition to the usual
      object list.  Upon seeing such list, pack-objects loosens the
      usual "self contained delta" constraints, and can produce delta
      against blobs and trees contained in the edge commits without
      storing the delta base objects themselves.
      
      The resulting packfile is not usable in .git/object/packs, but
      is a good way to implement "delta-only" transfer.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      7a979d99
  17. 18 Feb, 2006 1 commit
    • Junio C Hamano's avatar
      pack-objects: avoid delta chains that are too long. · e4c9327a
      Junio C Hamano authored
      This tries to rework the solution for the excess delta chain
      problem. An earlier commit worked it around ``cheaply'', but
      repeated repacking risks unbound growth of delta chains.
      
      This version counts the length of delta chain we are reusing
      from the existing pack, and makes sure a base object that has
      sufficiently long delta chain does not get deltified.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      e4c9327a
  18. 17 Feb, 2006 2 commits
    • Junio C Hamano's avatar
      pack-objects: finishing touches. · ca5381d4
      Junio C Hamano authored
      This introduces --no-reuse-delta option to disable reusing of
      existing delta, which is a large part of the optimization
      introduced by this series.  This may become necessary if
      repeated repacking makes delta chain too long.  With this, the
      output of the command becomes identical to that of the older
      implementation.  But the performance suffers greatly.
      
      It still allows reusing non-deltified representations; there is
      no point uncompressing and recompressing the whole text.
      
      It also adds a couple more statistics output, while squelching
      it under -q flag, which the last round forgot to do.
      
        $ time old-git-pack-objects --stdout >/dev/null <RL
        Generating pack...
        Done counting 184141 objects.
        Packing 184141 objects....................
        real    12m8.530s       user    11m1.450s       sys     0m57.920s
        $ time git-pack-objects --stdout >/dev/null <RL
        Generating pack...
        Done counting 184141 objects.
        Packing 184141 objects.....................
        Total 184141, written 184141 (delta 138297), reused 178833 (delta 134081)
        real    0m59.549s       user    0m56.670s       sys     0m2.400s
        $ time git-pack-objects --stdout --no-reuse-delta >/dev/null <RL
        Generating pack...
        Done counting 184141 objects.
        Packing 184141 objects.....................
        Total 184141, written 184141 (delta 134833), reused 47904 (delta 0)
        real    11m13.830s      user    9m45.240s       sys     0m44.330s
      
      There is one remaining issue when --no-reuse-delta option is not
      used.  It can create delta chains that are deeper than specified.
      
          A<--B<--C<--D   E   F   G
      
      Suppose we have a delta chain A to D (A is stored in full either
      in a pack or as a loose object. B is depth1 delta relative to A,
      C is depth2 delta relative to B...) with loose objects E, F, G.
      And we are going to pack all of them.
      
      B, C and D are left as delta against A, B and C respectively.
      So A, E, F, and G are examined for deltification, and let's say
      we decided to keep E expanded, and store the rest as deltas like
      this:
      
          E<--F<--G<--A
      
      Oops.  We ended up making D a bit too deep, didn't we?  B, C and
      D form a chain on top of A!
      
      This is because we did not know what the final depth of A would
      be, when we checked objects and decided to keep the existing
      delta.  Unfortunately, deferring the decision until just before
      the deltification is not an option.  To be able to make B, C,
      and D candidates for deltification with the rest, we need to
      know the type and final unexpanded size of them, but the major
      part of the optimization comes from the fact that we do not read
      the delta data to do so -- getting the final size is quite an
      expensive operation.
      
      To prevent this from happening, we should keep A from being
      deltified.  But how would we tell that, cheaply?
      
      To do this most precisely, after check_object() runs, each
      object that is used as the base object of some existing delta
      needs to be marked with the maximum depth of the objects we
      decided to keep deltified (in this case, D is depth 3 relative
      to A, so if no other delta chain that is longer than 3 based on
      A exists, mark A with 3).  Then when attempting to deltify A, we
      would take that number into account to see if the final delta
      chain that leads to D becomes too deep.
      
      However, this is a bit cumbersome to compute, so we would cheat
      and reduce the maximum depth for A arbitrarily to depth/4 in
      this implementation.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      ca5381d4
    • Junio C Hamano's avatar
      pack-objects: reuse data from existing packs. · a49dd05f
      Junio C Hamano authored
      When generating a new pack, notice if we have already needed
      objects in existing packs.  If an object is stored deltified,
      and its base object is also what we are going to pack, then
      reuse the existing deltified representation unconditionally,
      bypassing all the expensive find_deltas() and try_deltas()
      calls.
      
      Also, notice if what we are going to write out exactly match
      what is already in an existing pack (either deltified or just
      compressed).  In such a case, we can just copy it instead of
      going through the usual uncompressing & recompressing cycle.
      
      Without this patch, in linux-2.6 repository with about 1500
      loose objects and a single mega pack:
      
          $ git-rev-list --objects v2.6.16-rc3 >RL
          $ wc -l RL
          184141 RL
          $ time git-pack-objects p <RL
          Generating pack...
          Done counting 184141 objects.
          Packing 184141 objects....................
          a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
      
          real    12m4.323s
          user    11m2.560s
          sys     0m55.950s
      
      With this patch, the same input:
      
          $ time ../git.junio/git-pack-objects q <RL
          Generating pack...
          Done counting 184141 objects.
          Packing 184141 objects.....................
          a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
          Total 184141, written 184141, reused 182441
      
          real    1m2.608s
          user    0m55.090s
          sys     0m1.830s
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      a49dd05f
  19. 12 Feb, 2006 2 commits
  20. 29 Dec, 2005 1 commit
  21. 08 Dec, 2005 1 commit
  22. 29 Nov, 2005 1 commit
  23. 21 Nov, 2005 1 commit
  24. 26 Oct, 2005 1 commit
    • Junio C Hamano's avatar
      pack-objects: Allow use of pre-generated pack. · f3123c4a
      Junio C Hamano authored
      git-pack-objects can reuse pack files stored in $GIT_DIR/pack-cache
      directory, when a necessary pack is found.  This is hopefully useful
      when upload-pack (called from git-daemon) is expected to receive
      requests for the same set of objects many times (e.g full cloning
      request of any project, or updates from the set of heads previous day
      to the latest for a slow moving project).
      
      Currently git-pack-objects does *not* keep pack files it creates for
      reusing.  It might be useful to add --update-cache option to it,
      which would allow it store pack files it created in the pack-cache
      directory, and prune rarely used ones from it.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      f3123c4a
  25. 15 Oct, 2005 1 commit
  26. 13 Oct, 2005 1 commit
    • Linus Torvalds's avatar
      Add support for "local" packing · 64560374
      Linus Torvalds authored
      This adds the "--local" flag to git-pack-objects, which acts like
      "--incremental", except that instead of ignoring all packed objects, it
      only ignores objects that are packed and in an alternate object tree.
      
      As a result, it effectively only does a local re-pack: any remote-packed
      objects will stay in the alternate object directories.
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      64560374