• Jeff King's avatar
    pack-objects: enforce --depth limit in reused deltas · 7dbabbbe
    Jeff King authored
    Since 898b14ce (pack-objects: rework check_delta_limit usage,
    2007-04-16), we check the delta depth limit only when
    figuring out whether we should make a new delta. We don't
    consider it at all when reusing deltas, which means that
    packing once with --depth=250, and then again with
    --depth=50, the second pack may still contain chains larger
    than 50.
    This is generally considered a feature, as the results of
    earlier high-depth repacks are carried forward, used for
    serving fetches, etc. However, since we started using
    cross-pack deltas in c9af708b (pack-objects: use mru list
    when iterating over packs, 2016-08-11), we are no longer
    bounded by the length of an existing delta chain in a single
    Here's one particular pathological case: a sequence of N
    packs, each with 2 objects, the base of which is stored as a
    delta in a previous pack. If we chain all the deltas
    together, we have a cycle of length N. We break the cycle,
    but the tip delta is still at depth N-1.
    This is less unlikely than it might sound. See the included
    test for a reconstruction based on real-world actions.  I
    ran into such a case in the wild, where a client was rapidly
    sending packs, and we had accumulated 10,000 before doing a
    server-side repack.  The pack that "git repack" tried to
    generate had a very deep chain, which caused pack-objects to
    run out of stack space in the recursive write_one().
    This patch bounds the length of delta chains in the output
    pack based on --depth, regardless of whether they are caused
    by cross-pack deltas or existed in the input packs. This
    fixes the problem, but does have two possible downsides:
      1. High-depth aggressive repacks followed by "normal"
         repacks will throw away the high-depth chains.
         In the long run this is probably OK; investigation
         showed that high-depth repacks aren't actually
         beneficial, and we dropped the aggressive depth default
         to match the normal case in 07e7dbf0 (gc: default
         aggressive depth to 50, 2016-08-11).
      2. If you really do want to store high-depth deltas on
         disk, they may be discarded and new delta computed when
         serving a fetch, unless you set pack.depth to match
         your high-depth size.
    The implementation uses the existing search for delta
    cycles.  That lets us compute the depth of any node based on
    the depth of its base, because we know the base is DFS_DONE
    by the time we look at it (modulo any cycles in the graph,
    but we know there cannot be any because we break them as we
    see them).
    There is some subtlety worth mentioning, though. We record
    the depth of each object as we compute it. It might seem
    like we could save the per-object storage space by just
    keeping track of the depth of our traversal (i.e., have
    break_delta_chains() report how deep it went). But we may
    visit an object through multiple delta paths, and on
    subsequent paths we want to know its depth immediately,
    without having to walk back down to its final base (doing so
    would make our graph walk quadratic rather than linear).
    Likewise, one could try to record the depth not from the
    base, but from our starting point (i.e., start
    recursion_depth at 0, and pass "recursion_depth + 1" to each
    invocation of break_delta_chains()). And then when
    recursion_depth gets too big, we know that we must cut the
    delta chain.  But that technique is wrong if we do not visit
    the nodes in topological order. In a chain A->B->C, it
    if we visit "C", then "B", then "A", we will never recurse
    deeper than 1 link (because we see at each node that we have
    already visited it).
    Signed-off-by: default avatarJeff King <peff@peff.net>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
pack-objects.h 2.16 KB