Skip to content

Implement filtering repacks

Christian Couder requested to merge repack-filter4 into maint

Version 1 of the patch series this MR is based on was sent to the Git mailing list:

https://lore.kernel.org/git/20221012135114.294680-1-christian.couder@gmail.com/

/cc @jcaigitlab

Cover letter of V1:

Earlier this year, John Cai sent 2 versions of a patch series to implement git repack --filter=<filter-spec>:

https://lore.kernel.org/git/pull.1206.git.git.1643248180.gitgitgadget@gmail.com/

We tried to "sell" it as a way to use partial clone on a Git server to offload large blobs to, for example, an http server, while using multiple promisor remotes on the client side.

Even though it is still our end goal, it seems a bit far fetched for now and unnecessary as git repack --filter=<filter-spec> could be useful on the client side too.

For example one might want to clone with a filter to avoid too many space to be taken by some large blobs, and one might realize after some time that a number of the large blobs have still be downloaded because some old branches referencing them were checked out. In this case a filtering repack could remove some of those large blobs.

Some of the comments on the patch series that John sent were related to the possible data loss and repo corruption that a filtering repack could cause. It's indeed true that it could be very dangerous, and we agree that improvements were needed in this area.

To address this, in the patch 2/3 introducing --filter, we warn users launching such a repack on the command line and ask them if they really want to do it. If such a repack is not launched from a terminal, we die().

A new patch 3/3, though, introduces --force to allow users to launch such a repack without a terminal and without having to confirm it on the command line.

Patch 1/3 is a preparatory patch.

In short, this small patch series tries to reboot the previous one with a focus on the client side and a focus on safety.

Thanks to John Cai, who worked on the previous versions, and to Jonathan Nieder, Jonathan Tan and Taylor Blau, who recently discussed this with me at the Git Merge and Contributor Summit.

Edited by John Cai

Merge request reports