Proof of concept using git repack --filter to offload packfiles
Now that git repack --filter
patch series has been merged to master
, let's spin up a proof of concept where Gitaly housekeeping knows how to call git repack
with --filter
. We can add a new strategy option to the OptimizeRepositoryRequest
housekeeping RPC in Gitaly, so that Rails can pass in a parameter for sending certain blobs to a separate packfile.
To make things simple, maybe Gitaly can just have a special hard-coded directory under which it will put the packfiles for each repository. The alternate mechanism will be used so that Git and Gitaly can access these separate packfiles.
Design
Triggering a filtering repack
In proto/repository.proto
, OptimizeRepositoryRequest which is a request for the OptimizeRepository RPC contains a Strategy enum:
// Strategy determines how the repository shall be optimized.
enum Strategy {
// STRATEGY_UNSPECIFIED indicates that the strategy has not been explicitly set by the
// caller. The default will be STRATEGY_HEURISTICAL in that case.
STRATEGY_UNSPECIFIED = 0;
// STRATEGY_HEURISTICAL performs heuristical optimizations in the repository. The server will
// decide on a set of heuristics which parts need optimization and which ones don't to avoid
// performing unnecessary optimization tasks.
STRATEGY_HEURISTICAL = 1;
// STRATEGY_EAGER performs eager optimizations in the repository. The server will optimize all
// data structures regardless of whether they are well-optimized already.
STRATEGY_EAGER = 2;
}
I think we could add a different strategy here, to say that we want blobs to be moved to a different storage, maybe:
// STRATEGY_MOVE_BLOBS performs eager optimizations in the repository, like STRATEGY_EAGER and
// also moves the blobs away onto separate storage.
STRATEGY_MOVE_BLOBS = 3;
Perhaps in the future we will want to have different strategies, but I think it's enough for now.
Other code changes
In internal/git/housekeeping/objects.go
, we will need also another RepackObjectsStrategy
and associated code that will call performRepack() with --filter=blob:none
and --filter-to=...
.
Code will also be needed to properly setup the alternate mechanism for the repo so that it can still access the blobs that have been moved to the special hard-coded directory.
If the repo is later repacked using a different strategy, maybe we can also have code to remove the alternate mechanism setup for this repo.