[Discuss] Request limited diffs directly from Gitaly instead persisting part of it
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
We have docs on most part of how diffs work today on https://docs.gitlab.com/ee/development/diffs.html.
In summary:
- We persist diff files content on
merge_request_diff_filesup to a certain size per merge request - We persist truncated diffs of notes on
note_diff_filesup to a certain size per diff comment
Surpassing limits mean we should fallback to Gitaly (or Rugged) to fetch diffs data.
In practice it gets fuzzy at some point because we have different sources, plus the database, limitations to load this data to the user and when fetching from the main source (Gitaly, Rugged).
Historically, the data was persisted for performance reasons (I believe Rugged didn't perform quite well, and git diff command is quite expensive sometimes - please correct if I'm wrong on the Rugged point), what today can possibly can be handled by Gitaly (moving the cache outside the Ruby application, what's reasonable).
The propose is not to move 100% away from the persisted diffs, but stop persisting it for new merge requests since we can't really fetch all of them from the source (revisions could have been removed for instance, branches removed). Also, we'd certainly need to feature flag that and make sure Gitaly is capable of handling the extra traffic reliably.
Pros:
- Have a single source of truth
- Simplify future changes that use diffs
- Avoid inconsistent persisted cache on the Ruby application
- Reduce DB table size
Cons:
- Single point of failure
- Without caching on Gitaly side, we might perceive slowness when loading MR diffs
cc @jacobvosmaer-gitlab as I've seen you mentioning something related before