Skip to content

git/housekeeping: Introduce geometric repacking strategy

One of the most important tasks performed by our repository housekeeping is repacking objects into packfiles such that they can be efficiently accessed. Unfortunately, this is also the most expensive part in general as it is both CPU- and IOPS-intensive. So even though it would be most efficient to always pack all objects into a single packfile, this is not feasible especially in the context of large repositories.

We are thus using a hybrid approach right now where we alternate between incremmental repacks that soak up all loose objects, and full repacks that repack all packfiles into a single packfile once there are too many of them. In highly active repositories this strategy is too inefficient though as we accumulate incremental packfiles fast and then need to do the full repack quite regularly. And just bumping the limits of how many packfiles are allowed until the next full repack is not ideal either, as the repository will become less efficient to serve the more packfiles there are.

To improve upon this usecase, Git v2.33.0 has introduced support for geometric repacking that is a tradeoff between incremental and full repacks: instead of repacking either only loose objects or everything at once, a geometric repack will arrange packfiles so that the end result forms a geometric sequence regarding the amount of objects contained in each of the packs. Like this, a geometric repack would typically only rewrite a small slice of packfiles.

In the best case, we'd shift to use geometric repacks exclusively in order to reduce the overhead of repository maintenance. But there are two reasons why we cannot do so:

- Geometric repacks don't take reachability into account and will
  always include all loose objects. As we need to expire objects
  over time it means that we have to do a full repack with cruft
  packs every now and then to evict unreachable objects.

- Geometric repacks don't take delta islands into account. We use
  delta islands to make sure that deltas are only created against
  objects which are part of the default refspec used by clients,
  namely branches and tags. We thus need to regularly do a full
  repack that "freshens" our delta islands.

We're thus forced to keep alternating between full and geometric repacks. But we can do the full repacks a whole lot less frequently than we used to do them before. As we don't have control over the number of packfiles anymore (it's controlled by the geometric sequence now), we instead need to use a different heuristic to decide whether to do one or the other. For now, we are going with time-based repacks where we only do a full repack in case a certain amount of time has passed. This effectively puts a rate-limit in place.

Implement this new strategy behind a feature flag.

Closes #4998 (closed).

Merge request reports