1. 14 Feb, 2019 1 commit
    • Jeff King's avatar
      pack-objects: drop unused parameter from oe_map_new_pack() · c409d108
      Jeff King authored
      Since 43fa44fa (pack-objects: move in_pack out of struct object_entry,
      2018-04-14), we store the source pack for each object as a small index
      rather than as a pointer. When we see a new pack that has no allocated
      index, we fall back to generating an array of pointers by calling
      oe_map_new_pack().
      
      Perhaps counter-intuitively, that function does not need to actually see
      our new index-less pack. It only allocates and populates the array with
      the existing packs, after which oe_set_in_pack() actually adds the new
      pack to the array.
      
      Let's drop the unused "struct packed_git" argument to oe_map_new_pack()
      to avoid confusion.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      c409d108
  2. 28 Jan, 2019 2 commits
  3. 12 Nov, 2018 1 commit
  4. 19 Oct, 2018 1 commit
    • Johannes Schindelin's avatar
      pack-objects (mingw): initialize `packing_data` mutex in the correct spot · 34204c81
      Johannes Schindelin authored
      In 9ac3f0e5 (pack-objects: fix performance issues on packing large
      deltas, 2018-07-22), a mutex was introduced that is used to guard the
      call to set the delta size. This commit even added code to initialize
      it, but at an incorrect spot: in `init_threaded_search()`, while the
      call to `oe_set_delta_size()` (and hence to `packing_data_lock()`) can
      happen in the call chain `check_object()` <- `get_object_details()` <-
      `prepare_pack()` <- `cmd_pack_objects()`, which is long before the
      `prepare_pack()` function calls `ll_find_deltas()` (which initializes
      the threaded search).
      
      Another tell-tale that the mutex was initialized in an incorrect spot is
      that the function to initialize it lives in builtin/, while the code
      that uses the mutex is defined in a libgit.a header file.
      
      Let's use a more appropriate function: `prepare_packing_data()`, which
      not only lives in libgit.a, but *has* to be called before the
      `packing_data` struct is used that contains that mutex.
      
      This fixes https://github.com/git-for-windows/git/issues/1839.
      Signed-off-by: Johannes Schindelin's avatarJohannes Schindelin <johannes.schindelin@gmx.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      34204c81
  5. 29 Aug, 2018 1 commit
    • Jeff King's avatar
      convert "hashcmp() == 0" to hasheq() · e3ff0683
      Jeff King authored
      This is the partner patch to the previous one, but covering
      the "hash" variants instead of "oid".  Note that our
      coccinelle rule is slightly more complex to avoid triggering
      the call in hasheq().
      
      I didn't bother to add a new rule to convert:
      
        - hasheq(E1->hash, E2->hash)
        + oideq(E1, E2)
      
      Since these are new functions, there won't be any such
      existing callers. And since most of the code is already
      using oideq, we're not likely to introduce new ones.
      
      We might still see "!hashcmp(E1->hash, E2->hash)" from topics
      in flight. But because our new rule comes after the existing
      ones, that should first get converted to "!oidcmp(E1, E2)"
      and then to "oideq(E1, E2)".
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      e3ff0683
  6. 21 Aug, 2018 1 commit
    • Jeff King's avatar
      pack-objects: reuse on-disk deltas for thin "have" objects · 6a1e32d5
      Jeff King authored
      When we serve a fetch, we pass the "wants" and "haves" from
      the fetch negotiation to pack-objects. That tells us not
      only which objects we need to send, but we also use the
      boundary commits as "preferred bases": their trees and blobs
      are candidates for delta bases, both for reusing on-disk
      deltas and for finding new ones.
      
      However, this misses some opportunities. Modulo some special
      cases like shallow or partial clones, we know that every
      object reachable from the "haves" could be a preferred base.
      We don't use all of them for two reasons:
      
        1. It's expensive to traverse the whole history and
           enumerate all of the objects the other side has.
      
        2. The delta search is expensive, so we want to keep the
           number of candidate bases sane. The boundary commits
           are the most likely to work.
      
      When we have reachability bitmaps, though, reason 1 no
      longer applies. We can efficiently compute the set of
      reachable objects on the other side (and in fact already did
      so as part of the bitmap set-difference to get the list of
      interesting objects). And using this set conveniently
      covers the shallow and partial cases, since we have to
      disable the use of bitmaps for those anyway.
      
      The second reason argues against using these bases in the
      search for new deltas. But there's one case where we can use
      this information for free: when we have an existing on-disk
      delta that we're considering reusing, we can do so if we
      know the other side has the base object. This in fact saves
      time during the delta search, because it's one less delta we
      have to compute.
      
      And that's exactly what this patch does: when we're
      considering whether to reuse an on-disk delta, if bitmaps
      tell us the other side has the object (and we're making a
      thin-pack), then we reuse it.
      
      Here are the results on p5311 using linux.git, which
      simulates a client fetching after `N` days since their last
      fetch:
      
       Test                         origin              HEAD
       --------------------------------------------------------------------------
       5311.3: server   (1 days)    0.27(0.27+0.04)     0.12(0.09+0.03) -55.6%
       5311.4: size     (1 days)               0.9M              237.0K -73.7%
       5311.5: client   (1 days)    0.04(0.05+0.00)     0.10(0.10+0.00) +150.0%
       5311.7: server   (2 days)    0.34(0.42+0.04)     0.13(0.10+0.03) -61.8%
       5311.8: size     (2 days)               1.5M              347.7K -76.5%
       5311.9: client   (2 days)    0.07(0.08+0.00)     0.16(0.15+0.01) +128.6%
       5311.11: server   (4 days)   0.56(0.77+0.08)     0.13(0.10+0.02) -76.8%
       5311.12: size     (4 days)              2.8M              566.6K -79.8%
       5311.13: client   (4 days)   0.13(0.15+0.00)     0.34(0.31+0.02) +161.5%
       5311.15: server   (8 days)   0.97(1.39+0.11)     0.30(0.25+0.05) -69.1%
       5311.16: size     (8 days)              4.3M                1.0M -76.0%
       5311.17: client   (8 days)   0.20(0.22+0.01)     0.53(0.52+0.01) +165.0%
       5311.19: server  (16 days)   1.52(2.51+0.12)     0.30(0.26+0.03) -80.3%
       5311.20: size    (16 days)              8.0M                2.0M -74.5%
       5311.21: client  (16 days)   0.40(0.47+0.03)     1.01(0.98+0.04) +152.5%
       5311.23: server  (32 days)   2.40(4.44+0.20)     0.31(0.26+0.04) -87.1%
       5311.24: size    (32 days)             14.1M                4.1M -70.9%
       5311.25: client  (32 days)   0.70(0.90+0.03)     1.81(1.75+0.06) +158.6%
       5311.27: server  (64 days)   11.76(26.57+0.29)   0.55(0.50+0.08) -95.3%
       5311.28: size    (64 days)             89.4M               47.4M -47.0%
       5311.29: client  (64 days)   5.71(9.31+0.27)     15.20(15.20+0.32) +166.2%
       5311.31: server (128 days)   16.15(36.87+0.40)   0.91(0.82+0.14) -94.4%
       5311.32: size   (128 days)            134.8M              100.4M -25.5%
       5311.33: client (128 days)   9.42(16.86+0.49)    25.34(25.80+0.46) +169.0%
      
      In all cases we save CPU time on the server (sometimes
      significant) and the resulting pack is smaller. We do spend
      more CPU time on the client side, because it has to
      reconstruct more deltas. But that's the right tradeoff to
      make, since clients tend to outnumber servers. It just means
      the thin pack mechanism is doing its job.
      
      From the user's perspective, the end-to-end time of the
      operation will generally be faster. E.g., in the 128-day
      case, we saved 15s on the server at a cost of 16s on the
      client. Since the resulting pack is 34MB smaller, this is a
      net win if the network speed is less than 270Mbit/s. And
      that's actually the worst case. The 64-day case saves just
      over 11s at a cost of just under 11s. So it's a slight win
      at any network speed, and the 40MB saved is pure bonus. That
      trend continues for the smaller fetches.
      
      The implementation itself is mostly straightforward, with
      the new logic going into check_object(). But there are two
      tricky bits.
      
      The first is that check_object() needs access to the
      relevant information (the thin flag and bitmap result). We
      can do this by pushing these into program-lifetime globals.
      
      The second is that the rest of the code assumes that any
      reused delta will point to another "struct object_entry" as
      its base. But of course the case we are interested in here
      is the one where don't have such an entry!
      
      I looked at a number of options that didn't quite work:
      
       - we could use a flag to signal a reused delta, but it's
         not a single bit. We have to actually store the oid of
         the base, which is normally done by pointing to the
         existing object_entry. And we'd have to modify all the
         code which looks at deltas.
      
       - we could add the reused bases to the end of the existing
         object_entry array. While this does create some extra
         work as later stages consider the extra entries, it's
         actually not too bad (we're not sending them, so they
         don't cost much in the delta search, and at most we'd
         have 2*N of them).
      
         But there's a more subtle problem. Adding to the existing
         array means we might need to grow it with realloc, which
         could move the earlier entries around. While many of the
         references to other entries are done by integer index,
         some (including ones on the stack) use pointers, which
         would become invalidated.
      
         This isn't insurmountable, but it would require quite a
         bit of refactoring (and it's hard to know that you've got
         it all, since it may work _most_ of the time and then
         fail subtly based on memory allocation patterns).
      
       - we could allocate a new one-off entry for the base. In
         fact, this is what an earlier version of this patch did.
         However, since the refactoring brought in by ad635e82
         (Merge branch 'nd/pack-objects-pack-struct', 2018-05-23),
         the delta_idx code requires that both entries be in the
         main packing list.
      
      So taking all of those options into account, what I ended up
      with is a separate list of "external bases" that are not
      part of the main packing list. Each delta entry that points
      to an external base has a single-bit flag to do so; we have a
      little breathing room in the bitfield section of
      object_entry.
      
      This lets us limit the change primarily to the oe_delta()
      and oe_set_delta_ext() functions. And as a bonus, most of
      the rest of the code does not consider these dummy entries
      at all, saving both runtime CPU and code complexity.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      6a1e32d5
  7. 20 Aug, 2018 1 commit
  8. 16 Aug, 2018 2 commits
  9. 23 Jul, 2018 1 commit
    • Duy Nguyen's avatar
      pack-objects: fix performance issues on packing large deltas · 9ac3f0e5
      Duy Nguyen authored
      Let's start with some background about oe_delta_size() and
      oe_set_delta_size(). If you already know, skip the next paragraph.
      
      These two are added in 0aca34e8 (pack-objects: shrink delta_size
      field in struct object_entry - 2018-04-14) to help reduce 'struct
      object_entry' size. The delta size field in this struct is reduced to
      only contain max 1MB. So if any new delta is produced and larger than
      1MB, it's dropped because we can't really save such a large size
      anywhere. Fallback is provided in case existing packfiles already have
      large deltas, then we can retrieve it from the pack.
      
      While this should help small machines repacking large repos without
      large deltas (i.e. less memory pressure), dropping large deltas during
      the delta selection process could end up with worse pack files. And if
      existing packfiles already have >1MB delta and pack-objects is
      instructed to not reuse deltas, all of them will be dropped on the
      floor, and the resulting pack would be definitely bigger.
      
      There is also a regression in terms of CPU/IO if we have large on-disk
      deltas because fallback code needs to parse the pack every time the
      delta size is needed and just access to the mmap'd pack data is enough
      for extra page faults when memory is under pressure.
      
      Both of these issues were reported on the mailing list. Here's some
      numbers for comparison.
      
          Version  Pack (MB)  MaxRSS(kB)  Time (s)
          -------  ---------  ----------  --------
           2.17.0     5498     43513628    2494.85
           2.18.0    10531     40449596    4168.94
      
      This patch provides a better fallback that is
      
      - cheaper in terms of cpu and io because we won't have to read
        existing pack files as much
      
      - better in terms of pack size because the pack heuristics is back to
        2.17.0 time, we do not drop large deltas at all
      
      If we encounter any delta (on-disk or created during try_delta phase)
      that is larger than the 1MB limit, we stop using delta_size_ field for
      this because it can't contain such size anyway. A new array of delta
      size is dynamically allocated and can hold all the deltas that 2.17.0
      can. This array only contains delta sizes that delta_size_ can't
      contain.
      
      With this, we do not have to drop deltas in try_delta() anymore. Of
      course the downside is we use slightly more memory, even compared to
      2.17.0. But since this is considered an uncommon case, a bit more
      memory consumption should not be a problem.
      
      Delta size limit is also raised from 1MB to 16MB to better cover
      common case and avoid that extra memory consumption (99.999% deltas in
      this reported repo are under 12MB; Jeff noted binary artifacts topped
      out at about 3MB in some other private repos). Other fields are
      shuffled around to keep this struct packed tight. We don't use more
      memory in common case even with this limit update.
      
      A note about thread synchronization. Since this code can be run in
      parallel during delta searching phase, we need a mutex. The realloc
      part in packlist_alloc() is not protected because it only happens
      during the object counting phase, which is always single-threaded.
      
      Access to e->delta_size_ (and by extension
      pack->delta_size[e - pack->objects]) is unprotected as before, the
      thread scheduler in pack-objects must make sure "e" is never updated
      by two different threads.
      
      The area under the new lock is as small as possible, avoiding locking
      at all in common case, since lock contention with high thread count
      could be expensive (most blobs are small enough that delta compute
      time is short and we end up taking the lock very often). The previous
      attempt to always hold a lock in oe_delta_size() and
      oe_set_delta_size() increases execution time by 33% when repacking
      linux.git with with 40 threads.
      Reported-by: Elijah Newren's avatarElijah Newren <newren@gmail.com>
      Helped-by: Elijah Newren's avatarElijah Newren <newren@gmail.com>
      Helped-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      9ac3f0e5
  10. 06 May, 2018 1 commit
    • Johannes Schindelin's avatar
      Replace all die("BUG: ...") calls by BUG() ones · 033abf97
      Johannes Schindelin authored
      In d8193743 (usage.c: add BUG() function, 2017-05-12), a new macro
      was introduced to use for reporting bugs instead of die(). It was then
      subsequently used to convert one single caller in 588a538a
      (setup_git_env: convert die("BUG") to BUG(), 2017-05-12).
      
      The cover letter of the patch series containing this patch
      (cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
      terribly clear why only one call site was converted, or what the plan
      is for other, similar calls to die() to report bugs.
      
      Let's just convert all remaining ones in one fell swoop.
      
      This trick was performed by this invocation:
      
      	sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)
      Signed-off-by: Johannes Schindelin's avatarJohannes Schindelin <johannes.schindelin@gmx.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      033abf97
  11. 16 Apr, 2018 2 commits
    • Duy Nguyen's avatar
      pack-objects: shrink size field in struct object_entry · ac77d0c3
      Duy Nguyen authored
      It's very very rare that an uncompressed object is larger than 4GB
      (partly because Git does not handle those large files very well to
      begin with). Let's optimize it for the common case where object size
      is smaller than this limit.
      
      Shrink size field down to 31 bits and one overflow bit. If the size is
      too large, we read it back from disk. As noted in the previous patch,
      we need to return the delta size instead of canonical size when the
      to-be-reused object entry type is a delta instead of a canonical one.
      
      Add two compare helpers that can take advantage of the overflow
      bit (e.g. if the file is 4GB+, chances are it's already larger than
      core.bigFileThreshold and there's no point in comparing the actual
      value).
      
      Another note about oe_get_size_slow(). This function MUST be thread
      safe because SIZE() macro is used inside try_delta() which may run in
      parallel. Outside parallel code, no-contention locking should be dirt
      cheap (or insignificant compared to i/o access anyway). To exercise
      this code, it's best to run the test suite with something like
      
          make test GIT_TEST_OE_SIZE=4
      
      which forces this code on all objects larger than 3 bytes.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      ac77d0c3
    • Duy Nguyen's avatar
      pack-objects: move in_pack out of struct object_entry · 43fa44fa
      Duy Nguyen authored
      Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
      pack. Use an index instead since the number of packs should be
      relatively small.
      
      This limits the number of packs we can handle to 1k. Since we can't be
      sure people can never run into the situation where they have more than
      1k pack files. Provide a fall back route for it.
      
      If we find out they have too many packs, the new in_pack_by_idx[]
      array (which has at most 1k elements) will not be used. Instead we
      allocate in_pack[] array that holds nr_objects elements. This is
      similar to how the optional in_pack_pos field is handled.
      
      The new simple test is just to make sure the too-many-packs code path
      is at least executed. The true test is running
      
          make test GIT_TEST_FULL_IN_PACK_ARRAY=1
      
      to take advantage of other special case tests.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      43fa44fa
  12. 08 May, 2017 1 commit
  13. 18 Sep, 2014 1 commit
  14. 07 Jul, 2014 1 commit
  15. 02 Jun, 2014 1 commit
    • René Scharfe's avatar
      pack-objects: use free()+xcalloc() instead of xrealloc()+memset() · fb799474
      René Scharfe authored
      Whenever the hash table becomes too small then its size is increased,
      the original part (and the added space) is zerod out using memset(),
      and the table is rebuilt from scratch.
      
      Simplify this proceess by returning the old memory using free() and
      allocating the new buffer using xcalloc(), which already clears the
      buffer for us.  That way we avoid copying the old hash table contents
      needlessly inside xrealloc().
      
      While at it, use the first array member with sizeof instead of a
      specific type.  The old code used uint32_t and int, while index is
      actually an array of int32_t.  Their sizes are the same basically
      everywhere, so it's not actually a problem, but the new code is
      cleaner and doesn't have to be touched should the type be changed.
      Signed-off-by: default avatarRene Scharfe <l.s.r@web.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      fb799474
  16. 24 Oct, 2013 1 commit
    • Vicent Marti's avatar
      pack-objects: refactor the packing list · 2834bc27
      Vicent Marti authored
      The hash table that stores the packing list for a given `pack-objects`
      run was tightly coupled to the pack-objects code.
      
      In this commit, we refactor the hash table and the underlying storage
      array into a `packing_data` struct. The functionality for accessing and
      adding entries to the packing list is hence accessible from other parts
      of Git besides the `pack-objects` builtin.
      
      This refactoring is a requirement for further patches in this series
      that will require accessing the commit packing list from outside of
      `pack-objects`.
      
      The hash table implementation has been minimally altered: we now
      use table sizes which are always a power of two, to ensure a uniform
      index distribution in the array.
      Signed-off-by: default avatarVicent Marti <tanoku@gmail.com>
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      2834bc27
  17. 04 Aug, 2006 1 commit
  18. 25 Jul, 2006 1 commit
  19. 24 Jul, 2006 1 commit
  20. 10 Jul, 2006 1 commit
  21. 01 Jul, 2006 1 commit
  22. 30 Jun, 2006 1 commit
  23. 29 Jun, 2006 1 commit
  24. 21 Jun, 2006 1 commit
  25. 20 Jun, 2006 1 commit
  26. 06 Jun, 2006 1 commit
    • Linus Torvalds's avatar
      pack-objects: improve path grouping heuristics. · ce0bd642
      Linus Torvalds authored
      This trivial patch not only simplifies the name hashing, it actually
      improves packing for both git and the kernel.
      
      The git archive pack shrinks from 6824090->6622627 bytes (a 3%
      improvement), and the kernel pack shrinks from 108756213 to 108219021 (a
      mere 0.5% improvement, but still, it's an improvement from making the
      hashing much simpler!)
      
      We just create a 32-bit hash, where we "age" previous characters by two
      bits, so the last characters in a filename count most. So when we then
      compare the hashes in the sort routine, filenames that end the same way
      sort the same way.
      
      It takes the subdirectory into account (unless the filename is > 16
      characters), but files with the same name within the same subdirectory
      will obviously sort closer than files in different subdirectories.
      
      And, incidentally (which is why I tried the hash change in the first
      place, of course) builtin-rev-list.c will sort fairly close to rev-list.c.
      
      And no, it's not a "good hash" in the sense of being secure or unique, but
      that's not what we're looking for. The whole "hash" thing is misnamed
      here. It's not so much a hash as a "sorting number".
      
      [jc: rolled in simplification for computing the sorting number
       computation for thin pack base objects]
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      ce0bd642
  27. 31 May, 2006 1 commit
    • Linus Torvalds's avatar
      tree_entry(): new tree-walking helper function · 4c068a98
      Linus Torvalds authored
      This adds a "tree_entry()" function that combines the common operation of
      doing a "tree_entry_extract()" + "update_tree_entry()".
      
      It also has a simplified calling convention, designed for simple loops
      that traverse over a whole tree: the arguments are pointers to the tree
      descriptor and a name_entry structure to fill in, and it returns a boolean
      "true" if there was an entry left to be gotten in the tree.
      
      This allows tree traversal with
      
      	struct tree_desc desc;
      	struct name_entry entry;
      
      	desc.buf = tree->buffer;
      	desc.size = tree->size;
      	while (tree_entry(&desc, &entry) {
      		... use "entry.{path, sha1, mode, pathlen}" ...
      	}
      
      which is not only shorter than writing it out in full, it's hopefully less
      error prone too.
      
      [ It's actually a tad faster too - we don't need to recalculate the entry
        pathlength in both extract and update, but need to do it only once.
        Also, some callers can avoid doing a "strlen()" on the result, since
        it's returned as part of the name_entry structure.
      
        However, by now we're talking just 1% speedup on "git-rev-list --objects
        --all", and we're definitely at the point where tree walking is no
        longer the issue any more. ]
      
      NOTE! Not everybody wants to use this new helper function, since some of
      the tree walkers very much on purpose do the descriptor update separately
      from the entry extraction. So the "extract + update" sequence still
      remains as the core sequence, this is just a simplified interface.
      
      We should probably add a silly two-line inline helper function for
      initializing the descriptor from the "struct tree" too, just to cut down
      on the noise from that common "desc" initializer.
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      4c068a98
  28. 16 May, 2006 1 commit
  29. 15 May, 2006 3 commits
    • Junio C Hamano's avatar
      Fix pack-index issue on 64-bit platforms a bit more portably. · 1b9bc5a7
      Junio C Hamano authored
      Apparently <stdint.h> is not enough for uint32_t on OpenBSD; use
      "unsigned int" -- hopefully that would stay 32-bit on every
      platform we care about, at least until we update the pack-index
      file format.
      
      Our sha1 routines optimized for architectures use uint32_t and
      expects '#include <stdint.h>' to be enough, so OpenBSD on arm or
      ppc might have similar issues down the road, I dunno.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      1b9bc5a7
    • Nicolas Pitre's avatar
      pack-object: slightly more efficient · ff45715c
      Nicolas Pitre authored
      Avoid creating a delta index for objects with maximum depth since they
      are not going to be used as delta base anyway.  This also reduce peak
      memory usage slightly as the current object's delta index is not useful
      until the next object in the loop is considered for deltification. This
      saves a bit more than 1% on CPU usage.
      Signed-off-by: default avatarNicolas Pitre <nico@cam.org>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      ff45715c
    • Nicolas Pitre's avatar
      simple euristic for further free packing improvements · 4e8da195
      Nicolas Pitre authored
      Given that the early eviction of objects with maximum delta depth
      may exhibit bad packing on its own, why not considering a bias against
      deep base objects in try_delta() to mitigate that bad behavior.
      
      This patch adjust the MAX_size allowed for a delta based on the depth of
      the base object as well as enabling the early eviction of max depth
      objects from the object window.  When used separately, those two things
      produce slightly better and much worse results respectively.  But their
      combined effect is a surprising significant packing improvement.
      
      With this really simple patch the GIT repo gets nearly 15% smaller, and
      the Linux kernel repo about 5% smaller, with no significantly measurable
      CPU usage difference.
      Signed-off-by: default avatarNicolas Pitre <nico@cam.org>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      4e8da195
  30. 14 May, 2006 1 commit
  31. 13 May, 2006 1 commit
    • Dennis Stosberg's avatar
      Fix git-pack-objects for 64-bit platforms · 66561f5a
      Dennis Stosberg authored
      The offset of an object in the pack is recorded as a 4-byte integer
      in the index file.  When reading the offset from the mmap'ed index
      in prepare_pack_revindex(), the address is dereferenced as a long*.
      This works fine as long as the long type is four bytes wide.  On
      NetBSD/sparc64, however, a long is 8 bytes wide and so dereferencing
      the offset produces garbage.
      
      [jc: taking suggestion by Linus to use uint32_t]
      Signed-off-by: default avatarDennis Stosberg <dennis@stosberg.net>
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      66561f5a
  32. 05 May, 2006 1 commit
    • Junio C Hamano's avatar
      pack-object: squelch eye-candy on non-tty · 86118bcb
      Junio C Hamano authored
      One of my post-update scripts runs a git-fetch into a separate
      repository and sends the results back to me (2>&1); I end up
      getting this in the mail:
      
          Generating pack...
          Done counting 180 objects.
          Result has 131 objects.
          Deltifying 131 objects.
             0% (0/131) done^M   1% (2/131) done^M...
      
      This defaults not to do the progress report when not on a tty.
      
      You could give --progress to force the progress report, but
      let's not bother even documenting it nor mentioning it in the
      usage string.
      Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
      86118bcb
  33. 28 Apr, 2006 1 commit
  34. 27 Apr, 2006 1 commit
  35. 21 Apr, 2006 1 commit