Implement multi-pack-index maintenance job
Background
After conversation in !2054 (comment 323734866), it occurs to me that Gitlab/Gitaly does not run multi-pack-index as a maintenance job, which I think it should.
What
So I am proposing implementing this as a background maintenance job similar to Derrick proposal in https://github.com/gitgitgadget/git/pull/597/commits
How
Here are some pseudo-code in bash to demonstrate what the housekeeping job should look like:
git config core.multiPackIndex true
git multi-pack-index write --no-progress;
if git multi-pack-index verify --no-progress; then
:
else
rm -f ${PROJ_DIR}/.git/objects/pack/multi-pack-index;
git multi-pack-index write --no-progress;
fi
git multi-pack-index expire --no-progress;
git multi-pack-index repack --no-progress; # With configurable --batch-size=<size> option
After 2 runs (so that the old repacked-pack-files get cleaned up with expire, the pack files should be a lot better organized.
Additionally, we can implement a housekeeping job to pack up loose objects so that loose objects are slowly get packed and repacked under this scheme. Here is some more pseudo-code:
git prune-packed --quiet;
if ls ${PROJ_DIR}/.git/objects/?? 1> /dev/null 2>&1 ; then
find ${PROJ_DIR}/.git/objects/?? -type f |\
perl -pe "s@^${PROJ_DIR}/.git/objects/(..)/@\$1@" |\
git pack-objects -q ${PROJ_DIR}/.git/objects/pack/loose;
git prune-packed --quiet;
fi
There are 2 tasks I foresee need to happen:
-
Having Gitaly support multi-pack-index operations -
Having gitlab-rails/sidekiq schedule these operations
Why
Please read through https://lore.kernel.org/git/20180107181459.222909-1-dstolee@microsoft.com/T/#u to understand the details and performance benefit.
This housekeeping scheme benefit client side largely, but it does help a ton with operations such as git log.
Having this also enable a path way to !2054 (comment 323734866) which remove the need to unpack data to loose objects on push/fetch operation thus make pushes faster on NFS-based server.