Skip to content

Make git gc --prune more aggressive

Nick Thomas requested to merge git-prune-harder into master

Pruning loose objects saves space, but there's some risk of data loss if an object that is currently in use by a git-writing operation is pruned. Ignoring newer objects is an imperfect way of avoiding such races.

The prune argument to the GarbageCollect RPC is only used in one place in GitLab - the "Repository Cleanup" functionality - and that place is now protected by a read-only lock, so it should now be safe to run the prune in a more aggressive form. This reduces the time it takes for the user to receive feedback on whether the cleanup has succeeded.

The repository cleanup functionality is only available to owners, and will not be routinely used. It's not ideal that the prune is only safe if the repository is already read-only, but it is a pragmatic, and quick to implement, solution. The status quo is that people try to make their repositories smaller, only to find that they have inexplicably doubled in size.

We should continue to look at alternative approaches in the future that will allow us to make the prune: true argument safe in conjunction with concurrent repository writes.

Related to gitlab#220104 (closed)

Edited by Nick Thomas

Merge request reports