Proposal: Speed up MR Diffs loading and rendering with data caching

Summary

A lot of perceived poor performance in the MR Diffs app is due to heavy network load and wait times.

We can prevent those load times in many cases by loading and displaying from a local cache first, then letting network requests complete to expand or update the data.

Result

For many requests, the app will be able to load and display information nearly instantly - without waiting for a network request¹.

Critically, the app would need partial loading states: when data has been populated, but the app is currently loading data that could change what's displayed. This may need to disallow commenting, for example, to avoid a user trying to leave a comment on a file or a line that doesn't exist in an updated version.

Implementation

Create a local database with IndexedDB that stores MR info and Diff File info (to start).

These two types of information are both uniquely identifiable: Diff Files are IDed when they are loaded from the backend, and MRs can be identified in a similar way using their user/group, project, number, and version.

New interceptors will be used to coordinate two potential data sources: the local database and the remote API request. Calls to the interceptor would simultaneously check the database for information², and begin a load from the back end³. Because MR information ("metadata") and Diff File information can both be uniquely identified, these loads are inserted into the local database as soon as a network request completes - after they've been identified, but before any other data is added or removed from the payload.

Note that there's no caveat to the load from the backend: since we don't have any expiration date for MR or Diff information, the front end must request an update from the back end.

Security

Since this information is stored locally to a user, there is a risk of anything confidential being leaked.

There are many ways to avoid this, of increasing complexity:

Delete everything in the database when a browser session ends
- This is the most obvious way to avoid leaking data, but would likely also remove most of the performance gains for repeat visits.
Delete everything in the database when the user logs out
- This has the same problems as before, but may happen less frequently.
- In addition, there's still potential for leaked data, as a user may have a cached local copy of an issue that was public, but has been changed to confidential
Never store anything marked confidential
- This may still leave once-public items turned confidential in the cache
Encrypt the data stored in the database with a user's key

In any of these scenarios, we would also want to delete any data from the local database that the user cannot access. E.g., if an MR is switched to confidential and the user has a cached copy of the public versions of that MR, the next attempt to load that MR for that user that fails should delete all past copies of it.

There are undoubtedly even more potential security scenarios that should be fully explored before an implementation decision is made.

Implementation Details

IDB wrapper that turns the (extremely clunky, difficult to use) browser-native API into a promise-based abstraction: DexieJS.

¹: The assumption here is that the majority of page loads are to MRs already visited at least once. There will be no benefit when visiting MRs the user has never seen before.
²: The database check here will be for broad information like the current MR, not for specific information like the current version. This way it can load the latest version stored in the database, which may not be the latest version that exists.
³: Not only will this have an significant positive impact on the load times for the app, it will also lay most of the groundwork for a fully offline app, which would only require an additional Service Worker to intercept requests at the network layer.

Edited Feb 19, 2021 by Thomas Randolph