Use core delta islands to increase opportunity of pack reuse
Problem to solve
Depending on the structure of the packfile, cloning can be CPU and memory intensive. Ideally packfile reuse should kick in and steam the packfile directly from disk, but this isn't likely to happen currently.
I think we may be able to use delta islands to further accelerate full git clones.
If we do it right, after a full repack with delta islands, at the head of the new packfile there will be a "core island" pseudo-pack that corresponds to all objects reachable refs/heads or refs/tags at the time of the repack. This is the bulk of the clone.
I think that if we use a pack-objects hook we can take advantage of this "pseudo pack" inside the main pack. If we have enough metadata about the pseudo-pack inside the main pack, we can reduce the server side workload of
git clone to roughly an incremental fetch relative to the pseudo-pack.
The way I envision this, every time we do a full repack or a GC, we would have to create an extra metadata file to accompany the pack file, that tells us things we need to know about the pseudo-pack inside the main packfile. During a clone we can then have a stateless check in the pack-objects hook if (a) we are looking at a full clone (if not, fall back to normal pack-objects) and (b) if we have pseudo-pack metadata. If both conditions are met we tell the real pack-objects to only construct an incremental pack, and we glue that together with the pseudo pack to form the response. Because the pseudo-pack will probably be over 90% of the response and we only have to copy it from disk and compute a SHA1 sum over the stream, this would sidestep a lot of CPU work.
In my mind there are just two things I need to know to make this work:
- how do we find out which part of the main packfile is the pseudo-pack: where is the cutoff point?
- how do we recover the exact list of tips that corresponds to the pseudo-pack?
Do you have any insight on this @chriscool?
pack.islandCore, with something like:
git -C gitlab-ce.git -c pack.island='r(e)fs/heads' -c pack.island='r(e)fs/tags' -c pack.islandCore=e repack -iadb