git/housekeeping: Implement support for geometric repacking
Ideally, objects in repositories would only ever be contained in a single packfile so that they can be efficiently searched and served during normal operations. Unfortunately, writing packfiles becomes more and more expensive the larger the repository becomes. As a consequence, packing objects needs to be based on a compromise between keeping the number of overall packfiles low and keeping the overhead of the repack itself low.
The current approach implemented by Gitaly is to accept a certain number of packfiles that scales with the repository size: the larger the repo, the more packfiles we accept. But once the threshold has been reached, we perform a full repack that moves all objects into a single packfile. This heuristic is better than what we had a year ago, but it is still suboptimal in the context of large monorepositories where the number of packfiles will grow very fast. We thus still end up repacking such repositories quite frequently.
To help with this exact scenario, git-repack(1) has gained support for
geometric repacking in Git v2.32.0. With this mode, git-repack(1) will
make sure that packfiles form a geometric sequence: the next-larger
packfile must have at least factor r
times as many objects as the
previous packfile. If this condition is violated, Git will pick the
smallest set of packfiles that need to be merged in order to restore the
geometric sequence. Like this, we typically only have to merge a much
smaller set of objects into a single packfile compared to the previous
mode where we had to perform a full repack regularly.
Gitaly will soon start to make use of geometric repacking as well. As a preparatory step, implement the infrastructure in the form of a new repacking strategy and a bunch of tests to nail down that the new command really behaves as we expect it to behave.
The new infrastructure is not yet wired up and will thus not get hit via production traffic.
Part of #4998 (closed).