Cache the `info/refs`
Problem to solve
Each time a client asks Git for advertised-refs we generate that on fly and in some instances, requesting the info/refs
is 75% of all Git requests.
When Gitaly is using non-attached storage, like NFS, this has a big negative impact on performance. In at least one case we saw that info/refs
was responsible for over 70% of all RPC calls to NFS (balpark number from manual
testing on measuring impact on NFS of doing git receive-pack
on repository with large amount of refs).
Further details
Consider, that even if we pack-refs, not all refs end-up in packed-refs
, sometimes we have stale directories in refs/...
, or have refs directly on disk storage. Iterating through all refs generate I think 1 I/O per-file, this has very big impact perform of info/refs
. Doing pack-refs
helps, but we still generate a number of I/O ops.
The periodic fetch of info/refs
is very common for multiple Git clients that periodically refresh
the state of repository to check for new changes, like JGit
, SourceTree
, Jenkins
, TeamCity
, etc.
Usually, these intervals are very aggressive with as low as 5-10s, this creates an abusive usage pattern
on service.
Proposal
Since info/refs
are read more frequently than they are modified, even more so in environments with heavy polling, implementing a cache could reduce disk IO significantly in some situations. This is expected to have a material impact on performance of servers with slower disk IO, like NFS.
Idea 1
Use core.logAllRefUpdates
to track changes, and invalidate cache.
Idea 2
Use aggressive git pack-refs
and re-implement git upload-pack
in Gitaly.