Skip to content
  • Jeff King's avatar
    pack-objects: walk tag chains for --include-tag · b773ddea
    Jeff King authored and Junio C Hamano's avatar Junio C Hamano committed
    When pack-objects is given --include-tag, it peels each tag
    ref down to a non-tag object, and if that non-tag object is
    going to be packed, we include the tag, too. But what
    happens if we have a chain of tags (e.g., tag "A" points to
    tag "B", which points to commit "C")?
    
    We'll peel down to "C" and realize that we want to include
    tag "A", but we do not ever consider tag "B", leading to a
    broken pack (assuming "B" was not otherwise selected).
    Instead, we have to walk the whole chain, adding any tags we
    find to the pack.
    
    Interestingly, it doesn't seem possible to trigger this
    problem with "git fetch", but you can with "git clone
    --single-branch". The reason is that we generate the correct
    pack when the client explicitly asks for "A" (because we do
    a real reachability analysis there), and "fetch" is more
    willing to do so. There are basically two cases:
    
      1. If "C" is already a ref tip, then the client can deduce
         that it needs "A" itself (via find_non_local_tags), and
         will ask for it explicitly rather than relying on the
         include-tag capability. Everything works.
    
      2. If "C" is not already a ref tip, then we hope for
         include-tag to send us the correct tag. But it doesn't;
         it generates a broken pack. However, the next step is
         to do a follow-up run of find_non_local_tags(),
         followed by fetch_refs() to backfill any tags we
         learned about.
    
         In the normal case, fetch_refs() calls quickfetch(),
         which does a connectivity check and sees we have no
         new objects to fetch. We just write the refs.
    
         But for the broken-pack case, the connectivity check
         fails, and quickfetch will follow-up with the remote,
         asking explicitly for each of the ref tips. This picks
         up the missing object in a new pack.
    
    For a regular "git clone", we are similarly OK, because we
    explicitly request all of the tag refs, and get a correct
    pack. But with "--single-branch", we kick in tag
    auto-following via "include-tag", but do _not_ do a
    follow-up backfill. We just take whatever the server sent us
    via include-tag and write out tag refs for any tag objects
    we were sent. So prior to c6807a40 (clone: open a shortcut
    for connectivity check, 2013-05-26), we actually claimed the
    clone was a success, but the result was silently
    corrupted!  Since c6807a40
    
    , index-pack's connectivity
    check catches this case, and we correctly complain.
    
    The included test directly checks that pack-objects does not
    generate a broken pack, but also confirms that "clone
    --single-branch" does not hit the bug.
    
    Note that tag chains introduce another interesting question:
    if we are packing the tag "B" but not the commit "C", should
    "A" be included?
    
    Both before and after this patch, we do not include "A",
    because the initial peel_ref() check only knows about the
    bottom-most level, "C". To realize that "B" is involved at
    all, we would have to switch to an incremental peel, in
    which we examine each tagged object, asking if it is being
    packed (and including the outer tag if so).
    
    But that runs contrary to the optimizations in peel_ref(),
    which avoid accessing the objects at all, in favor of using
    the value we pull from packed-refs. It's OK to walk the
    whole chain once we know we're going to include the tag (we
    have to access it anyway, so the effort is proportional to
    the pack we're generating). But for the initial selection,
    we have to look at every ref. If we're only packing a few
    objects, we'd still have to parse every single referenced
    tag object just to confirm that it isn't part of a tag
    chain.
    
    This could be addressed if packed-refs stored the complete
    tag chain for each peeled ref (in most cases, this would be
    the same cost as now, as each "chain" is only a single
    link). But given the size of that project, it's out of scope
    for this fix (and probably nobody cares enough anyway, as
    it's such an obscure situation). This commit limits itself
    to just avoiding the creation of a broken pack.
    
    Signed-off-by: default avatarJeff King <peff@peff.net>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    b773ddea