Skip to content

housekeeping: Implement logic to write cruft packs

Patrick Steinhardt requested to merge pks-housekeeping-cruft-packs into master

In order to avoid races during garbage collection, Git will only delete unreachable objects after a specific grace period. This is why repacking a repository will explode all unreachable objects into loose objects, so that Git can track the last time each of these unreachable objects has been accessed. This is an important problem though, as loose objects cannot be stored deltified and will potentially slow down Git commands when there are too many of them. But storing them in a simple packfile does not fly because the access time of all objects contained in that packfile would be freshened whenever a single of them is being accessed.

To solve this issue, Git has introduced cruft packs in Git v2.37.0. This mechanism allows Git to continue storing unreachable objects in a pack that is annotated with a .mtimes data structure. This data structure tracks per-object access times that can be updated separately whenever any of the objects is being accessed.

There are some more advantages that cruft packs give us that are more Gitaly-specific:

- We can use it as a stepping stone to efficiently compute a
  repository's size while excluding unreachable objects that are
  part of a cruft pack.

- It gives us better insight into truly unreachable objects and
  reachable ones. This has been an issue with our current metrics
  which only discern recent and unreachable objects. Those have
  caused us to underestimate the amount of unreachable objects.

- In the general case, we will likely be able to punt on running
  git-prune(1) more often as repositories typically would not have
  loose objects anymore.

Cruft packs are generated at the time git-repack(1) is executed, and furthermore only when doing a full repack. What it does in that case is that it will not only write a single packfile, but two packfiles that are segregated into reachable and unreachable objects. Furthermore, the command will also remove any objects that had been part of a cruft pack and which have last been accessed before the grace period.

Implement the logic to write cruft packs in our repository housekeeping.

Closes #4351 (closed).

Edited by Patrick Steinhardt

Merge request reports