Implement multi-pack-index maintenance job

Background

After conversation in !2054 (comment 323734866), it occurs to me that Gitlab/Gitaly does not run multi-pack-index as a maintenance job, which I think it should.

What

So I am proposing implementing this as a background maintenance job similar to Derrick proposal in https://github.com/gitgitgadget/git/pull/597/commits

How

Here are some pseudo-code in bash to demonstrate what the housekeeping job should look like:

    git config core.multiPackIndex true

    git multi-pack-index write --no-progress;

    if git multi-pack-index verify --no-progress; then
      :
    else
      rm -f ${PROJ_DIR}/.git/objects/pack/multi-pack-index;
      git multi-pack-index write --no-progress;
    fi

    git multi-pack-index expire --no-progress;
    git multi-pack-index repack --no-progress; # With configurable --batch-size=<size> option

After 2 runs (so that the old repacked-pack-files get cleaned up with expire, the pack files should be a lot better organized.

Additionally, we can implement a housekeeping job to pack up loose objects so that loose objects are slowly get packed and repacked under this scheme. Here is some more pseudo-code:

    git prune-packed --quiet;

    if ls ${PROJ_DIR}/.git/objects/?? 1> /dev/null 2>&1 ; then
      find ${PROJ_DIR}/.git/objects/?? -type f |\
        perl -pe "s@^${PROJ_DIR}/.git/objects/(..)/@\$1@" |\
        git pack-objects -q ${PROJ_DIR}/.git/objects/pack/loose;

      git prune-packed --quiet;
    fi

There are 2 tasks I foresee need to happen:

  • Having Gitaly support multi-pack-index operations
  • Having gitlab-rails/sidekiq schedule these operations

Why

Please read through https://lore.kernel.org/git/20180107181459.222909-1-dstolee@microsoft.com/T/#u to understand the details and performance benefit.

This housekeeping scheme benefit client side largely, but it does help a ton with operations such as git log.

Having this also enable a path way to !2054 (comment 323734866) which remove the need to unpack data to loose objects on push/fetch operation thus make pushes faster on NFS-based server.

Reference

Edited by 🤖 GitLab Bot 🤖