Skip to content
  • Patrick Steinhardt's avatar
    git: Speed up git-upload-pack(1) via new bitmap boundary traversal · 678fcadf
    Patrick Steinhardt authored
    During packfile negotiation, git-upload-pack(1) needs to figure out
    which objects to send to the client side. This is done by separating the
    object graph into two sets:
    
        - "Haves" are objects that the client already has available locally
          and that the server thus does not need to send.
    
        - "Wants" are objects that the clients wants to fetch.
    
    The set of objects that git-upload-pack(1) needs to send is then the set
    of wants with all haves excluded. The naive way to figure this out is to
    walk through both of these sets and paint the objects accordingly. This
    can be quite expensive though as history and thus the amount of objects
    that need to be painted grows.
    
    Git has thus introduced bitmaps which are generated for a subset of the
    commits. For any such commit, the bitmap will immediately identify all
    objects that are transitively reachable from that specific commit. So as
    soon as we hit such a commit that has an associated bitmap, we can abort
    the graph walk as we already know transatively reachable objects from
    thereon.
    
    The current algorithm that's used by Git does not perform well though in
    repositories that have sparse bitmap coverage of commits, which is the
    case especially in large monorepositories with many references. The
    chance that any such bitmap is going to be beneficial diminishes as it
    becomes less likely that we eventually hit the bitmapped objects. The
    consequence is that the bitmap walk is oftentimes slower than the normal
    walk in such cases.
    
    To address the issue, Git has introduced a new algorithm in commit
    b0afdce5da (pack-bitmap.c: use commit boundary during bitmap traversal,
    2023-05-08). Instead of biulding up a complete bitmap first, the new
    algorithm only starts to build up bitmaps when the boundary between
    haves and wants is hit. This reduces the overhead of bitmaps by just
    building up bitmap coverage when required.
    
    This new algorithm has the downside that it's not exact anymore and may
    thus lead to too many objects being sent over the wire. This is not a
    huge problem in general though, as the normal non-bitmapped walk already
    has the same property. This is traded in for increased performance
    though especially in the case where there are many haves and only a few
    wants.
    
    This commit enables the new traversal for git-upload-pack(1), which does
    exactly such a large reachability check via `git rev-list --not --all`
    in its connectivity check. It's not yet clear whether this will indeed
    result in a speedup, but by putting this behind a feature flag we should
    be in a good position to test.
    
    [1]: https://github.blog/2023-08-21-highlights-from-git-2-42/#faster-object-traversals-with-bitmaps
    678fcadf