Skip to content

git/housekeeping: Implement support for geometric repacking

Patrick Steinhardt requested to merge pks-housekeeping-geometric-repacking into master

Ideally, objects in repositories would only ever be contained in a single packfile so that they can be efficiently searched and served during normal operations. Unfortunately, writing packfiles becomes more and more expensive the larger the repository becomes. As a consequence, packing objects needs to be based on a compromise between keeping the number of overall packfiles low and keeping the overhead of the repack itself low.

The current approach implemented by Gitaly is to accept a certain number of packfiles that scales with the repository size: the larger the repo, the more packfiles we accept. But once the threshold has been reached, we perform a full repack that moves all objects into a single packfile. This heuristic is better than what we had a year ago, but it is still suboptimal in the context of large monorepositories where the number of packfiles will grow very fast. We thus still end up repacking such repositories quite frequently.

To help with this exact scenario, git-repack(1) has gained support for geometric repacking in Git v2.32.0. With this mode, git-repack(1) will make sure that packfiles form a geometric sequence: the next-larger packfile must have at least factor r times as many objects as the previous packfile. If this condition is violated, Git will pick the smallest set of packfiles that need to be merged in order to restore the geometric sequence. Like this, we typically only have to merge a much smaller set of objects into a single packfile compared to the previous mode where we had to perform a full repack regularly.

Gitaly will soon start to make use of geometric repacking as well. As a preparatory step, implement the infrastructure in the form of a new repacking strategy and a bunch of tests to nail down that the new command really behaves as we expect it to behave.

The new infrastructure is not yet wired up and will thus not get hit via production traffic.

Part of #4998 (closed).

Merge request reports