Introduce caching on Projects::MergeRequests::DiffsController#diffs_batch.json
Problem to solve
During investigation in #330893 (closed), noticed that Projects::MergeRequestController#diffs_batch.json
can take a while for a large MR (example: https://staging.gitlab.com/gpt/large_projects/gitlabhq1/-/merge_requests/8785).
Locally, each batch of diffs can take around ~1.2s to load. Profiling the endpoint shows that it takes most of the time serializing the response. When whitespace changes are ignored, it's much worse because we don't have batching in it yet and it takes ~10s to load.
Proposal
While working on a PoC on how we can cache this endpoint (!63644 (closed)), noticed that it can go down up to ~400ms per batch when cached (~600ms when whitespace changes are ignored). That's ~66% improvement if my math is correct.
In the PoC, we cache the serialized response via Rails.cache
. It's utilizing the Api::Helpers::Caching
module with some modification to work with Rails controllers instead of Grape API (see #render_cached
). This will cache each diffs batch on a MR.
To implement this, we can:
- Utilize the
#render_cache
or equivalent that will be implemented in #332967 (closed). - Define appropriate cache key for
Gitlab::Diff::FileCollection::MergeRequestDiffBatch
andGitlab::Diff::FileCollection::Compare
. In the PoC, used theMergeRequestDiff#cache_key
forMergeRequestDiffBatch
while thehead
andbase
ID forCompare
. - Define a cache expiration. 1 day seems fine as old versions don't change. The
HEAD
diff can possibly change more often though depending on the activity on the target branch, so we can look at defining a lower cache expiration for this case. - Implement it behind a feature flag then roll it out in a separate roll out issue. This way we can enable it on production and monitor how it affects our Redis cache usage.
Be on the look out for possible bug due to cache response is still being returned while it shouldn't.