Skip to content
  • Jeff King's avatar
    pack-objects: reuse on-disk deltas for thin "have" objects · 6a1e32d5
    Jeff King authored and Junio C Hamano's avatar Junio C Hamano committed
    When we serve a fetch, we pass the "wants" and "haves" from
    the fetch negotiation to pack-objects. That tells us not
    only which objects we need to send, but we also use the
    boundary commits as "preferred bases": their trees and blobs
    are candidates for delta bases, both for reusing on-disk
    deltas and for finding new ones.
    
    However, this misses some opportunities. Modulo some special
    cases like shallow or partial clones, we know that every
    object reachable from the "haves" could be a preferred base.
    We don't use all of them for two reasons:
    
      1. It's expensive to traverse the whole history and
         enumerate all of the objects the other side has.
    
      2. The delta search is expensive, so we want to keep the
         number of candidate bases sane. The boundary commits
         are the most likely to work.
    
    When we have reachability bitmaps, though, reason 1 no
    longer applies. We can efficiently compute the set of
    reachable objects on the other side (and in fact already did
    so as part of the bitmap set-difference to get the list of
    interesting objects). And using this set conveniently
    covers the shallow and partial cases, since we have to
    disable the use of bitmaps for those anyway.
    
    The second reason argues against using these bases in the
    search for new deltas. But there's one case where we can use
    this information for free: when we have an existing on-disk
    delta that we're considering reusing, we can do so if we
    know the other side has the base object. This in fact saves
    time during the delta search, because it's one less delta we
    have to compute.
    
    And that's exactly what this patch does: when we're
    considering whether to reuse an on-disk delta, if bitmaps
    tell us the other side has the object (and we're making a
    thin-pack), then we reuse it.
    
    Here are the results on p5311 using linux.git, which
    simulates a client fetching after `N` days since their last
    fetch:
    
     Test                         origin              HEAD
     --------------------------------------------------------------------------
     5311.3: server   (1 days)    0.27(0.27+0.04)     0.12(0.09+0.03) -55.6%
     5311.4: size     (1 days)               0.9M              237.0K -73.7%
     5311.5: client   (1 days)    0.04(0.05+0.00)     0.10(0.10+0.00) +150.0%
     5311.7: server   (2 days)    0.34(0.42+0.04)     0.13(0.10+0.03) -61.8%
     5311.8: size     (2 days)               1.5M              347.7K -76.5%
     5311.9: client   (2 days)    0.07(0.08+0.00)     0.16(0.15+0.01) +128.6%
     5311.11: server   (4 days)   0.56(0.77+0.08)     0.13(0.10+0.02) -76.8%
     5311.12: size     (4 days)              2.8M              566.6K -79.8%
     5311.13: client   (4 days)   0.13(0.15+0.00)     0.34(0.31+0.02) +161.5%
     5311.15: server   (8 days)   0.97(1.39+0.11)     0.30(0.25+0.05) -69.1%
     5311.16: size     (8 days)              4.3M                1.0M -76.0%
     5311.17: client   (8 days)   0.20(0.22+0.01)     0.53(0.52+0.01) +165.0%
     5311.19: server  (16 days)   1.52(2.51+0.12)     0.30(0.26+0.03) -80.3%
     5311.20: size    (16 days)              8.0M                2.0M -74.5%
     5311.21: client  (16 days)   0.40(0.47+0.03)     1.01(0.98+0.04) +152.5%
     5311.23: server  (32 days)   2.40(4.44+0.20)     0.31(0.26+0.04) -87.1%
     5311.24: size    (32 days)             14.1M                4.1M -70.9%
     5311.25: client  (32 days)   0.70(0.90+0.03)     1.81(1.75+0.06) +158.6%
     5311.27: server  (64 days)   11.76(26.57+0.29)   0.55(0.50+0.08) -95.3%
     5311.28: size    (64 days)             89.4M               47.4M -47.0%
     5311.29: client  (64 days)   5.71(9.31+0.27)     15.20(15.20+0.32) +166.2%
     5311.31: server (128 days)   16.15(36.87+0.40)   0.91(0.82+0.14) -94.4%
     5311.32: size   (128 days)            134.8M              100.4M -25.5%
     5311.33: client (128 days)   9.42(16.86+0.49)    25.34(25.80+0.46) +169.0%
    
    In all cases we save CPU time on the server (sometimes
    significant) and the resulting pack is smaller. We do spend
    more CPU time on the client side, because it has to
    reconstruct more deltas. But that's the right tradeoff to
    make, since clients tend to outnumber servers. It just means
    the thin pack mechanism is doing its job.
    
    From the user's perspective, the end-to-end time of the
    operation will generally be faster. E.g., in the 128-day
    case, we saved 15s on the server at a cost of 16s on the
    client. Since the resulting pack is 34MB smaller, this is a
    net win if the network speed is less than 270Mbit/s. And
    that's actually the worst case. The 64-day case saves just
    over 11s at a cost of just under 11s. So it's a slight win
    at any network speed, and the 40MB saved is pure bonus. That
    trend continues for the smaller fetches.
    
    The implementation itself is mostly straightforward, with
    the new logic going into check_object(). But there are two
    tricky bits.
    
    The first is that check_object() needs access to the
    relevant information (the thin flag and bitmap result). We
    can do this by pushing these into program-lifetime globals.
    
    The second is that the rest of the code assumes that any
    reused delta will point to another "struct object_entry" as
    its base. But of course the case we are interested in here
    is the one where don't have such an entry!
    
    I looked at a number of options that didn't quite work:
    
     - we could use a flag to signal a reused delta, but it's
       not a single bit. We have to actually store the oid of
       the base, which is normally done by pointing to the
       existing object_entry. And we'd have to modify all the
       code which looks at deltas.
    
     - we could add the reused bases to the end of the existing
       object_entry array. While this does create some extra
       work as later stages consider the extra entries, it's
       actually not too bad (we're not sending them, so they
       don't cost much in the delta search, and at most we'd
       have 2*N of them).
    
       But there's a more subtle problem. Adding to the existing
       array means we might need to grow it with realloc, which
       could move the earlier entries around. While many of the
       references to other entries are done by integer index,
       some (including ones on the stack) use pointers, which
       would become invalidated.
    
       This isn't insurmountable, but it would require quite a
       bit of refactoring (and it's hard to know that you've got
       it all, since it may work _most_ of the time and then
       fail subtly based on memory allocation patterns).
    
     - we could allocate a new one-off entry for the base. In
       fact, this is what an earlier version of this patch did.
       However, since the refactoring brought in by ad635e82
    
    
       (Merge branch 'nd/pack-objects-pack-struct', 2018-05-23),
       the delta_idx code requires that both entries be in the
       main packing list.
    
    So taking all of those options into account, what I ended up
    with is a separate list of "external bases" that are not
    part of the main packing list. Each delta entry that points
    to an external base has a single-bit flag to do so; we have a
    little breathing room in the bitfield section of
    object_entry.
    
    This lets us limit the change primarily to the oe_delta()
    and oe_set_delta_ext() functions. And as a bonus, most of
    the rest of the code does not consider these dummy entries
    at all, saving both runtime CPU and code complexity.
    
    Signed-off-by: default avatarJeff King <peff@peff.net>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    6a1e32d5