[Discuss] Request limited diffs directly from Gitaly instead persisting part of it

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

  • Close this issue

We have docs on most part of how diffs work today on https://docs.gitlab.com/ee/development/diffs.html.

In summary:

  • We persist diff files content on merge_request_diff_files up to a certain size per merge request
  • We persist truncated diffs of notes on note_diff_files up to a certain size per diff comment

Surpassing limits mean we should fallback to Gitaly (or Rugged) to fetch diffs data.

In practice it gets fuzzy at some point because we have different sources, plus the database, limitations to load this data to the user and when fetching from the main source (Gitaly, Rugged).

Historically, the data was persisted for performance reasons (I believe Rugged didn't perform quite well, and git diff command is quite expensive sometimes - please correct if I'm wrong on the Rugged point), what today can possibly can be handled by Gitaly (moving the cache outside the Ruby application, what's reasonable).

The propose is not to move 100% away from the persisted diffs, but stop persisting it for new merge requests since we can't really fetch all of them from the source (revisions could have been removed for instance, branches removed). Also, we'd certainly need to feature flag that and make sure Gitaly is capable of handling the extra traffic reliably.

Pros:

  • Have a single source of truth
  • Simplify future changes that use diffs
  • Avoid inconsistent persisted cache on the Ruby application
  • Reduce DB table size

Cons:

  • Single point of failure
  • Without caching on Gitaly side, we might perceive slowness when loading MR diffs

cc @jacobvosmaer-gitlab as I've seen you mentioning something related before

cc @smcgivern @DouweM

Edited Jun 23, 2025 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading