Skip to content

repository: Add new RPC to prune unreachable objects

Patrick Steinhardt requested to merge pks-prune-unreachable-objects into master

When rewriting the repository's history with the BFG Repo-Cleaner, then we potentially accumulate lots and lots of unreachable objects in the repository's object database. By default, we'd clean up those objects after two weeks, which is a rather long time to sit on such a huge number of objects. To fix this usecase we have thus gained a prune parameter in our GarbageCollect RPC call: if set, then we prune unreachable objects if they haven't been accessed during the last 30 mintues.

The problem with this though is that GarbageCollect does a lot more than only pruning objects: it may end up packing objects or objects, writing commit-graphs, write bitmaps or some other things. All of these are things we want to control ourselves though, but we instead let git-gc(1) dictate how the repository is packed.

We're thus about to deprecate all RPCs which directly influence how a repository is packed in favor of OptimizeRepository: this is our "black box" RPC that, from the viewpoint of the caller, does something with the repository to make it great again. And this is by design: callers should not control the way Gitaly handles repository maintenance.

This highlights the need though for a new RPC call which only prunes objects which have become unreachable to disentangle it from repository maintenance tasks. This commit thus introduces PruneUnreachableObjects, a new RPC which does exactly that: any unreachable loose object that hasn't been touched in the last 30 minutes is going to be pruned.

Note that to make this work correctly, the caller has to do two RPC calls: the first RPC call to OptimizeRepository is required to unpack unreachable loose objects, and 30 minutes later they may prune these objects with a second call to PruneUnreachableObjects.

This is no different from right now, even though it's hidden away and (naturally) used incorrectly by Rails: GarbageCollect would need to be called twice, first to explode unreachable objects into loose objects and then second with prune=true to prune them after half an hour. This is because Git will only ever consider loose objects for pruning, and the grace period is determined by inspecting its access time. So the way Rails does this is broken, and the new RPC call doesn't change that fact. This is a separate story though and nothing we can fix in Gitaly: we must retain the grace period to avoid repository corruption.

Changelog: added

Fixes #4041 (closed)

Merge request reports