Enable 'git repack' to filter objects
The goal is for git repack to filter objects in the same way as a partial clone can. This would make it possible to easily move large blobs away from a repo that GitLab manages into a partial clone remote.
If many large blobs can be moved away from repos, these large blobs could be stored on cheaper infrastructure, and be fetched independently only when/if they are needed, which would increase performance of the repo and the whole infrastructure.
For example large blobs could be moved to a plain HTTP partial clone remote, as shown in step 5 of this HTTP partial clone demo:
https://gitlab.com/chriscool/partial-clone-demo/-/blob/master/http-promisor/demo.txt
Ideally we would only need to run git repack -a -d --filter=<filter-spec> instead of the following steps that are currently needed:
- call
git repack -a -dto clean up all the loose objects and have only one packfile, - call
echo | git pack-objects <option>... --filter=<filter-spec> <tmp-name>with a lot of options, - remove old packfile
- rename new packfile from to their right name
- create an empty
<right-name>.promisorfile
The following things currently prevent git repack -a -d --filter=<filter-spec> from working properly:
-
git repackdoesn't support the--filter=<filter-spec>option, -
git repackcallsgit pack-objectswith some options currently incompatible with--filter=<filter-spec> -
git pack-objectsdoesn't support--filter=<filter-spec>without--stdout -
git pack-objectswith--filter=<filter-spec>doesn't seem to be able to generate bitmaps