Skip to content

Schedule `git pack-refs` after push

Jan Provaznik requested to merge pack-refs into master

What does this MR do?

Schedules git pack-refs to run after git push to make sure that number of refs files is not too big.

The problem is that whenever client asks for refs we read these from gitaly. Each IO operation to read files separately is expensive. We already run pack refs as part of GitGarbageCollectWorker's gc task, which runs once per 200 git push operations. But because fetching refs is very common operation, it may still be expensive to read all refs files generated during 200 pushes, especially if the deployment uses a slow storage for repositories (e.g. NFS share).

To optimize this further, we could run pack_refs more often. https://gitlab.com/gitlab-org/gitlab-ce/issues/59715#note_163806764 suggests running pack_refs asynchronously after each push with some delay. Another option is to call it as part of GitGarbageCollectWorker task which we trigger every few pushes.

The benefit of using the existing Houskeeping service is that we can re-use some code and we avoid potential race issues: if we would schedule pack_refs independently as a separate async job, it might run at the same time as GitGarbageCollectorWorker.

Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/59715

Does this MR meet the acceptance criteria?

Conformity

Performance and testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Jan Provaznik

Merge request reports