Skip to content

Cache the `info/refs`

Problem to solve

Each time a client asks Git for advertised-refs we generate that on fly and in some instances, requesting the info/refs is 75% of all Git requests.

When Gitaly is using non-attached storage, like NFS, this has a big negative impact on performance. In at least one case we saw that info/refs was responsible for over 70% of all RPC calls to NFS (balpark number from manual testing on measuring impact on NFS of doing git receive-pack on repository with large amount of refs).

Further details

Consider, that even if we pack-refs, not all refs end-up in packed-refs, sometimes we have stale directories in refs/..., or have refs directly on disk storage. Iterating through all refs generate I think 1 I/O per-file, this has very big impact perform of info/refs. Doing pack-refs helps, but we still generate a number of I/O ops.

The periodic fetch of info/refs is very common for multiple Git clients that periodically refresh the state of repository to check for new changes, like JGit, SourceTree, Jenkins, TeamCity, etc. Usually, these intervals are very aggressive with as low as 5-10s, this creates an abusive usage pattern on service.

Proposal

Since info/refs are read more frequently than they are modified, even more so in environments with heavy polling, implementing a cache could reduce disk IO significantly in some situations. This is expected to have a material impact on performance of servers with slower disk IO, like NFS.

Idea 1

#1631 (comment 162098607)

Use core.logAllRefUpdates to track changes, and invalidate cache.

Idea 2

#1631 (comment 162617124)

Use aggressive git pack-refs and re-implement git upload-pack in Gitaly.

Links / references

Edited by James Ramsay (ex-GitLab)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information