• Duy Nguyen's avatar
    shallow.c: the 8 steps to select new commits for .git/shallow · 58babfff
    Duy Nguyen authored
    Suppose a fetch or push is requested between two shallow repositories
    (with no history deepening or shortening). A pack that contains
    necessary objects is transferred over together with .git/shallow of
    the sender. The receiver has to determine whether it needs to update
    .git/shallow if new refs needs new shallow comits.
    
    The rule here is avoid updating .git/shallow by default. But we don't
    want to waste the received pack. If the pack contains two refs, one
    needs new shallow commits installed in .git/shallow and one does not,
    we keep the latter and reject/warn about the former.
    
    Even if .git/shallow update is allowed, we only add shallow commits
    strictly necessary for the former ref (remember the sender can send
    more shallow commits than necessary) and pay attention not to
    accidentally cut the receiver history short (no history shortening is
    asked for)
    
    So the steps to figure out what ref need what new shallow commits are:
    
    1. Split the sender shallow commit list into "ours" and "theirs" list
       by has_sha1_file. Those that exist in current repo in "ours", the
       remaining in "theirs".
    
    2. Check the receiver .git/shallow, remove from "ours" the ones that
       also exist in .git/shallow.
    
    3. Fetch the new pack. Either install or unpack it.
    
    4. Do has_sha1_file on "theirs" list again. Drop the ones that fail
       has_sha1_file. Obviously the new pack does not need them.
    
    5. If the pack is kept, remove from "ours" the ones that do not exist
       in the new pack.
    
    6. Walk the new refs to answer the question "what shallow commits,
       both ours and theirs, are required in .git/shallow in order to add
       this ref?". Shallow commits not associated to any refs are removed
       from their respective list.
    
    7. (*) Check reachability (from the current refs) of all remaining
       commits in "ours". Those reachable are removed. We do not want to
       cut any part of our (reachable) history. We only check up
       commits. True reachability test is done by
       check_everything_connected() at the end as usual.
    
    8. Combine the final "ours" and "theirs" and add them all to
       .git/shallow. Install new refs. The case where some hook rejects
       some refs on a push is explained in more detail in the push
       patches.
    
    Of these steps, #6 and #7 are expensive. Both require walking through
    some commits, or in the worst case all commits. And we rather avoid
    them in at least common case, where the transferred pack does not
    contain any shallow commits that the sender advertises. Let's look at
    each scenario:
    
    1) the sender has longer history than the receiver
    
       All shallow commits from the sender will be put into "theirs" list
       at step 1 because none of them exists in current repo. In the
       common case, "theirs" becomes empty at step 4 and exit early.
    
    2) the sender has shorter history than the receiver
    
       All shallow commits from the sender are likely in "ours" list at
       step 1. In the common case, if the new pack is kept, we could empty
       "ours" and exit early at step 5.
    
       If the pack is not kept, we hit the expensive step 6 then exit
       after "ours" is emptied. There'll be only a handful of objects to
       walk in fast-forward case. If it's forced update, we may need to
       walk to the bottom.
    
    3) the sender has same .git/shallow as the receiver
    
       This is similar to case 2 except that "ours" should be emptied at
       step 2 and exit early.
    
    A fetch after "clone --depth=X" is case 1. A fetch after "clone" (from
    a shallow repo) is case 3. Luckily they're cheap for the common case.
    
    A push from "clone --depth=X" falls into case 2, which is expensive.
    Some more work may be done at the sender/client side to avoid more
    work on the server side: if the transferred pack does not contain any
    shallow commits, send-pack should not send any shallow commits to the
    receive-pack, effectively turning it into a normal push and avoid all
    steps.
    
    This patch implements all steps except #3, already handled by
    fetch-pack and receive-pack, #6 and #7, which has their own patch due
    to their size.
    
    (*) in previous versions step 7 was put before step 3. I reorder it so
        that the common case that keeps the pack does not need to walk
        commits at all. In future if we implement faster commit
        reachability check (maybe with the help of pack bitmaps or commit
        cache), step 7 could become cheap and be moved up before 6 again.
    Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <[email protected]>
    Signed-off-by: default avatarJunio C Hamano <[email protected]>
    58babfff
trace.c 4.69 KB