Skip to content

GarbageCollect and RepackFull should have early return heuristics

For historical reasons, gitlab-rails will ask Gitaly to repack a Git repository if it thinks there have been "enough" recent changes. Sometimes this is plain wrong, and there have been no changes. gitlab-com/gl-infra/scalability#20

Repacks are both CPU and IO intensive. It is worth avoiding them.

We could add a heuristic to the RepackFull and GarbageCollect RPC's that checks if anything changed in the object store of the repo, and return early if not. Something like:

  • list all packfiles and loose objects. If there are no loose objects, no objects/info/alternates, and just one packfile, return and do nothing

This can be refined over time but I suspect this one thing catches a fair number of mirrored repos that never change.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information