Enable Gitaly pack-objects cache in reference architectures

The Gitaly pack-objects cache was added in GitLab 13.11. Since then we have made a number of efficiency improvements (mainly gitlab-com/gl-infra&463 (closed)) and we are now able to server gitlab-org/gitlab CI Git fetch traffic without needing a custom CI_PRE_CLONE_SCRIPT.

Because using GitLab with CI is a common use case we should consider turning on the pack-objects cache in our reference architectures.

Considerations:

  • It is important to do this with 14.5 or newer to get the improvements of gitlab-com/gl-infra&463 (closed)
  • We should probably also recommend concurrency limiting because we have seen self-managed support cases where unbounded concurrency cause the Gitaly server to be overwhelmed in spite of the cache
  • We should also recommend shallow clone because it reduces the amount of data that has to be transfered. But shallow clone is a project CI setting, so maybe not something we can address in the reference architecture?

Why is the pack-objects cache not on by default?

The pack-objects cache deduplicates concurrent Git fetches on the server by buffering their response to a file. When you enable the pack-objects cache, you start generating a significant amount of disk write IO. At worst, when the cache hit rate is 0%, you write as many bytes/second to disk as you send out the network interface of the Gitaly server. It depends on the GitLab server infrastructure if that is OK or not.

It would be a good outcome if we decide we can turn on the pack-objects cache by default but we would have to validate the IO impact on different kinds of deployments first.

Edited by Jacob Vosmaer