Skip to content

git: Speed up git-upload-pack(1) via new bitmap boundary traversal

Patrick Steinhardt requested to merge pks-pack-boundary-traversal into master

During packfile negotiation, git-upload-pack(1) needs to figure out which objects to send to the client side. This is done by separating the object graph into two sets:

- "Haves" are objects that the client already has available locally
  and that the server thus does not need to send.

- "Wants" are objects that the clients wants to fetch.

The set of objects that git-upload-pack(1) needs to send is then the set of wants with all haves excluded. The naive way to figure this out is to walk through both of these sets and paint the objects accordingly. This can be quite expensive though as history and thus the amount of objects that need to be painted grows.

Git has thus introduced bitmaps which are generated for a subset of the commits. For any such commit, the bitmap will immediately identify all objects that are transitively reachable from that specific commit. So as soon as we hit such a commit that has an associated bitmap, we can abort the graph walk as we already know transatively reachable objects from thereon.

The current algorithm that's used by Git does not perform well though in repositories that have sparse bitmap coverage of commits, which is the case especially in large monorepositories with many references. The chance that any such bitmap is going to be beneficial diminishes as it becomes less likely that we eventually hit the bitmapped objects. The consequence is that the bitmap walk is oftentimes slower than the normal walk in such cases.

To address the issue, Git has introduced a new algorithm in commit b0afdce5da (pack-bitmap.c: use commit boundary during bitmap traversal, 2023-05-08). Instead of biulding up a complete bitmap first, the new algorithm only starts to build up bitmaps when the boundary between haves and wants is hit. This reduces the overhead of bitmaps by just building up bitmap coverage when required.

This new algorithm has the downside that it's not exact anymore and may thus lead to too many objects being sent over the wire. This is not a huge problem in general though, as the normal non-bitmapped walk already has the same property. This is traded in for increased performance though especially in the case where there are many haves and only a few wants.

This commit enables the new traversal for git-upload-pack(1), which does exactly such a large reachability check via git rev-list --not --all in its connectivity check. It's not yet clear whether this will indeed result in a speedup, but by putting this behind a feature flag we should be in a good position to test.

Part of Experiment: Use bitmap boundary traversals to s... (#5537 - closed).

Merge request reports