Rethinking how we render diffs
The Issue
Currently rendering diffs is slow. There's probably quite a few reasons why this is, but the two most notable issues I can see are:
- We cache the highlighted diffs but not the rendered HTML from showing the diffs
- We build the diff markup line-by-line, checking for discussions
Proposals
Cache keys
We need to verify the state of the cache_key
field for the Project
. We don't seem to be using this at the moment. Ideally we need to trigger this to change whenever any action takes place on the project, but notably whenever someone pushes.
If possible it would also be good to have a viable cache_key
for specific commits. This would need to account for force-pushes.
Render diffs faster, in a cachable, generic way
We should then refactor how our diffs are rendered to remove any user-specific aspects, so that we can cache all the markup rendered for the diff in a single blob. I previously separated the line views for this purpose but it would be better to remove the discussion aspect entirely.
Reducing the complexity of the view here would be important as the rendering itself is slow right now and caching alone won't improve that. Every small optimisation here will have a large impact due to the number of lines being rendered; maybe we even look at alternative ways of rendering this area, such as in helpers?
It might be worth looking at ways of compressing the rendered markup as well. We do this directly in Redis but I'm not sure if it re-expands it afterwards, which would be unnecessary (need to check this).
Is there the opportunity to use multiple threads to speed up rendering of the diffs here? Once the discussions are removed I believe there should be no database dependencies in the renderer, and so we might be able to more freely use threads to render either multiple lines simultaneously, or multiple diffs simultaneously.
Split discussions or other user context off into the frontend
We can add contextual data to the rendered markup to give the frontend hooks to attach controls to if needed. But by removing the discussion interactions from the diff lines we can cache entire diffs and render the lines much faster. It would even be faster for us to just add the discussions on here in a second loop through the rendered/cached lines in the backend, I believe.
Multi-fetch the cached rendered diffs
Ideally we're now caching entire rendered diffs for each file, and we would probably want to fetch them in one network call to Redis rather than one call per file.
Alternatively we could look at mixing individual calls with streaming responses to display the diffs as they are returned, but this would add complexity.
Use cache-key rotation for cache expiry
Currently most of our caches are expired with explicit delete calls, but by moving to cache-key rotation expiry here we'll save on network requests and make the caching more flexible and easier to work with.
Pre-caching
Once this is in place we open up the possibility of pre-caching the rendered diffs on push, which would provide a nice speed boost to the first page load for the commit or MR changes views.