Support client-side pagination of Git diffs
Motivation
Gitaly provides RPCs in the diff
service for clients to inspect differences between revisions of a Git repository. The larger the set of files altered between revisions, the required compute to generate file diffs scales. This may result in an operation taking too long and exceeding a configured timeout resulting in failure. Presently Gitaly has no means to circumvent this timeout issue other than increasing timeouts. To mitigate this timeout concern and support generating large diffs between revisions, Gitaly should provide an API that allows the generated diffs to be generated per file. This grants clients additional flexibility by allowing them to choose which files they want diffs for at any particular time. The client can follow up with subsequent RPCs to get file diffs as needed on demand. By breaking the diff operation up by file, timeouts due to excessive amounts of files being batched in a single diff can be avoided.
Solution
To break up file diffs, the client must first be aware of all the files changed between a set of revisions. A summary of all the file changes between revisions of a repository without the diffs can be generated via git-diff-tree(1)
. This summary also contains the blob IDs before and after for a particular file. Gitaly should expose an RPC that performs this underlying Git operation and provides the client with information about the diff.
To actually generate a diff for a particular file, Gitaly should introduce another RPC that performs git-diff(1)
using the blob ID before and after for a particular file. Diffing the blobs is the smallest unit of change that can be diffed and gives the client flexibility as to which files diffs should be generated for. Due to core.bigFileThreshold
being set in Gitaly to 50MB, a diff request on a single file should never be large enough to trigger a timeout.
Overall this approach does put a bit more burden on the client to coordinate invoking RPC calls to generate individual file diffs, but in exchange the client gains the ability to control which diffs are requested allowing for better pagination.
Related: #5486 (comment 1502771589)