⏳ Progressively load merge request diffs
<!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION -->
*This page may contain information related to upcoming products, features and functionality.
It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes.
Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.*
<!-- triage-serverless v3 PLEASE DO NOT REMOVE THIS SECTION -->
## Job To Be Done
When I have decided to open a merge request to perform a code review or respond to feedback, I want to maintain my flow and focus in this task without delays that will break my flow like a slow page load, or needing to reload the page because of a time out, so that I feel like I am in control of what I work on an when and that I am able to be be a good team member that responds in a timely fashion. I will feel in control if I can start my work immediately, and if the merge request is very large that I can start working on something and see progress as the page loads.
## Further details
Opening a merge request to view is **slow** :turtle: because `diffs.json` and `discussions.json` must both be loaded fully before the page can be used, and both include the content of every single diff and every single discussion. This means that as the merge request grows in size, in either the number of files/lines changes and discussions, page load times become increasingly slow to the point of timeout and failure.
This epic is entirely focused on addressing the problems associated with loading all diff data in one go.
## Further details
Considering a merge request, the life cycle of a showing a rendered diff on the merge request page is:
- :fire: **fast** - Call DiffStats RPC for diff statistics (renamed/changed, number of lines changed)
- :turtle: **slow** - Generate syntax highlighted diff for the current merge request version
- Call ??? RPC for plain diff
- Call ?? RPC for changed each file, called twice, once for the HEAD and merge base
- Syntax highlight each changed file, run twice, once for the HEAD and merge base
- Generate syntax highlighted diff for each file
- Store the syntax highlighted diff for the merge request version
- Return `diffs.json` response using DiffStats and the syntax highlighted diff
- Client renders entire diff
## Proposal
### :bar_chart: add monitoring https://gitlab.com/gitlab-org/gitlab/issues/31286
We do not know how frequently `diffs.json` is timing out causing the page to be unusable, nor how frequently diff limits are being reach causing diffs to be collapsed causing additional requests which are slow and annoying.
### :scissors: extract DiffStats data from `diffs.json` https://gitlab.com/gitlab-org/gitlab/issues/31288
The DiffStats RPC is very fast, and the response is small, so this can loaded quickly and early.
This means that users will be able to understand if this is a large or small merge request early, helping them understand why it might be taking a while to load.
- Create API for DiffStats
- Interface should render tree and merge request stats information based on the Diff Stats response
### :hourglass_flowing_sand: paginate diff data https://gitlab.com/gitlab-org/gitlab/issues/31290
For a merge request where the diff has already been syntax highlighted and cached, the merge request often still slow to load because the entire diff.json must be loaded before the page can be used at all
- Create API for loading diffs in pages to reduce size of data being transferred
- Interface should load all diffs page by page
### :lipstick: file-by-file syntax highlighting https://gitlab.com/gitlab-org/gitlab/issues/31291 https://gitlab.com/gitlab-org/gitlab/issues/30550
- Store cached syntax highlighted diff by file, not merge request
### :heart: optimize direct link scenarios
Direct linking to a specific file should load the diff for just that file first before loading other pages
- Create API for loading the diff for a specific file
- Interface should load the file direct linked file first, then backfill
## How will we measure success
Measure Time To First Byte (TTFB) for a representative large merge requests, so that:
- TTFB for diff stats is less than 200ms
- TTFB for a cached syntax highlighted diff page is less than 200ms
- TTFB for a uncached syntax highlighted diff page is less than 500ms
Representative merge requests:
- https://gitlab.com/gitlab-com/www-gitlab-com/merge_requests/25615
- https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/14731
## Links / references
epic