Skip to content

Diff syntax highlighting takes a reasonable amount of time to be processed in the first load

Giving a bit of context on how highlighting backend works:

  1. On the first /diffs load we use Rouge to highlight both old and new blobs (we need them to properly present the diff)
  2. The whole highlighted diff is cached on Redis
  3. We reset the cache when reloading diffs, or after 1 week
  4. Just MRs have the highlighting diffs cached

Given that, I've run a few local tests in a big MR such as oswaldo/nautilus-test!1 (diffs) and the output is on https://gitlab.com/snippets/1743242.

Summary

  • 122 files
  • 244 blobs
  • It takes 7.7 seconds just highlighting (take with a grain of salt, it's a localhost)

Possible solutions

A quick improvement would be scheduling the highlighting upon the MR creation:

  • 👍 Perceived performance improvement on first load
  • 👎 We would probably cache more and spend more memory

Although frontend still need some work to handle the amount of data being received, backend would be improved here.

I wonder if there would be a place in Gitaly for an gRPC that could handle that job in a second step. We have Chroma on Go realm, and Go probably handles CPU-bound things like highlighting more efficiently (still should give Chroma a try).

cc @DouweM @jramsay

Also cc'ing @smcgivern for now 😄

Edited by Oswaldo Ferreira