Implement multi-pack-index maintenance job
## Background After conversation in https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2054#note_323734866, it occurs to me that Gitlab/Gitaly does not run multi-pack-index as a maintenance job, which I think it should. ## What So I am proposing implementing this as a background maintenance job similar to Derrick proposal in https://github.com/gitgitgadget/git/pull/597/commits ## How Here are some pseudo-code in bash to demonstrate what the housekeeping job should look like: ``` git config core.multiPackIndex true git multi-pack-index write --no-progress; if git multi-pack-index verify --no-progress; then : else rm -f ${PROJ_DIR}/.git/objects/pack/multi-pack-index; git multi-pack-index write --no-progress; fi git multi-pack-index expire --no-progress; git multi-pack-index repack --no-progress; # With configurable --batch-size=<size> option ``` After 2 runs (so that the old repacked-pack-files get cleaned up with `expire`, the pack files should be a lot better organized. Additionally, we can implement a housekeeping job to pack up loose objects so that loose objects are slowly get packed and repacked under this scheme. Here is some more pseudo-code: ``` git prune-packed --quiet; if ls ${PROJ_DIR}/.git/objects/?? 1> /dev/null 2>&1 ; then find ${PROJ_DIR}/.git/objects/?? -type f |\ perl -pe "s@^${PROJ_DIR}/.git/objects/(..)/@\$1@" |\ git pack-objects -q ${PROJ_DIR}/.git/objects/pack/loose; git prune-packed --quiet; fi ``` There are 2 tasks I foresee need to happen: - [ ] Having Gitaly support multi-pack-index operations - [ ] Having gitlab-rails/sidekiq schedule these operations ## Why Please read through https://lore.kernel.org/git/20180107181459.222909-1-dstolee@microsoft.com/T/#u to understand the details and performance benefit. This housekeeping scheme benefit client side largely, but it does help a ton with operations such as `git log`. Having this also enable a path way to https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2054#note_323734866 which remove the need to unpack data to loose objects on push/fetch operation thus make pushes faster on NFS-based server. ## Reference - https://git-scm.com/docs/multi-pack-index - https://github.com/gitgitgadget/git/pull/597/ - https://lore.kernel.org/git/20180107181459.222909-1-dstolee@microsoft.com/ - https://git-scm.com/docs/git-multi-pack-index
issue