Skip to content

Enable 'git repack' to filter objects

The goal is for git repack to filter objects in the same way as a partial clone can. This would make it possible to easily move large blobs away from a repo that GitLab manages into a partial clone remote.

If many large blobs can be moved away from repos, these large blobs could be stored on cheaper infrastructure, and be fetched independently only when/if they are needed, which would increase performance of the repo and the whole infrastructure.

For example large blobs could be moved to a plain HTTP partial clone remote, as shown in step 5 of this HTTP partial clone demo:

https://gitlab.com/chriscool/partial-clone-demo/-/blob/master/http-promisor/demo.txt

Ideally we would only need to run git repack -a -d --filter=<filter-spec> instead of the following steps that are currently needed:

  • call git repack -a -d to clean up all the loose objects and have only one packfile,
  • call echo | git pack-objects <option>... --filter=<filter-spec> <tmp-name> with a lot of options,
  • remove old packfile
  • rename new packfile from to their right name
  • create an empty <right-name>.promisor file

The following things currently prevent git repack -a -d --filter=<filter-spec> from working properly:

  • git repack doesn't support the --filter=<filter-spec> option,
  • git repack calls git pack-objects with some options currently incompatible with --filter=<filter-spec>
  • git pack-objects doesn't support --filter=<filter-spec> without --stdout
  • git pack-objects with --filter=<filter-spec> doesn't seem to be able to generate bitmaps
Edited by Christian Couder