Introduce caching on Projects::MergeRequests::DiffsController#diffs_batch.json

Problem to solve

During investigation in #330893 (closed), noticed that Projects::MergeRequestController#diffs_batch.json can take a while for a large MR (example: https://staging.gitlab.com/gpt/large_projects/gitlabhq1/-/merge_requests/8785).

Locally, each batch of diffs can take around ~1.2s to load. Profiling the endpoint shows that it takes most of the time serializing the response. When whitespace changes are ignored, it's much worse because we don't have batching in it yet and it takes ~10s to load.

Proposal

While working on a PoC on how we can cache this endpoint (!63644 (closed)), noticed that it can go down up to ~400ms per batch when cached (~600ms when whitespace changes are ignored). That's ~66% improvement if my math is correct.

In the PoC, we cache the serialized response via Rails.cache. It's utilizing the Api::Helpers::Caching module with some modification to work with Rails controllers instead of Grape API (see #render_cached). This will cache each diffs batch on a MR.

To implement this, we can:

Utilize the #render_cache or equivalent that will be implemented in #332967 (closed).
Define appropriate cache key for Gitlab::Diff::FileCollection::MergeRequestDiffBatch and Gitlab::Diff::FileCollection::Compare. In the PoC, used the MergeRequestDiff#cache_key for MergeRequestDiffBatch while the head and base ID for Compare.
Define a cache expiration. 1 day seems fine as old versions don't change. The HEAD diff can possibly change more often though depending on the activity on the target branch, so we can look at defining a lower cache expiration for this case.
Implement it behind a feature flag then roll it out in a separate roll out issue. This way we can enable it on production and monitor how it affects our Redis cache usage.

Be on the look out for possible bug due to cache response is still being returned while it shouldn't.

Edited Jun 09, 2021 by Patrick Bajao