Projects::BlobController#show consumes CPU heavily and allocates tons of memory
Problem
This issue is spin-off from #840 (closed).
Projects::BlobController#show is an endpoint that shows the content of a file from a particular commit on the Web UI, for example Gitlab's CHANGELOG. The default setting is to display a file in rich format, it means that we have to render original files (markdown for example) to a user-friendly format (HTML for example) and display onto the UI. Rendering is costly, both CPU and memory, especially when we have to render a big file. Let's look at some numbers:
- The original file size of https://gitlab.com/gitlab-org/gitlab/-/blob/master/CHANGELOG.md is 418kb.
- The rendered HTML file size is 3.4MB:
- In case of cache hits, it takes tiny amount of time and effort to return the result.
- Otherwise, it constantly takes 6-15 seconds of CPU time, and allocates 800MB to 3.6GB of memories for rendering the content. Unfortunately, the caches in a busy repository get invalidated so quickly.
The worst part is that the whole rendering process is done in the web servers.
Solutions
Solution 1: Pre-caching big files after a push
After a big file is pushed, web server receives a hook from git, we schedule a worker to render the files in the background jobs and pre-cache the files.
- Advantages: simple, easy to implement, the current flow stays intact
- Disadvantages:
- If a file is never accessed, it's a big waste of resources
- If a file never changes, the cache soon gets expired
 
Solution 2: move the whole rendering process to background. Whenever a user accesses the page and the file is not rendered yet, we trigger a background job for rendering and tell the UI poll for the result.
This solution, in theory, doesn't change the overall waiting time of the users. Instead, it moves the heavy works from the web servers to the background workers, which suppose to be the right place for such jobs.
To push this further, @robotmay_gitlab has a great suggestion that we can put the rendered file to a storage object, and let workhorse serve the static file from the storage object directly without going through the Rails layer.
- 
Advantages: - The web servers could offload the heavy works to the background workers. It means they are capable to handle more requests.
- We can control the resources for rendering. The rendering time is likely to decrease if we schedule the jobs to high CPU capacity nodes (thanks to worker_resource_boundary)
 
- 
Disadvantages: - The effort is high. We need to change the flow completely
- This approach may not be good for small files.
 
Impact and priorities
This issue is at extremely low priority. The number of troublesome requests are small: 8,781 over 14,850,593 requests to this endpoint. This issue can be completely ignored when looking at the direct numbers.
However, the impact of constantly CPU consumption and memory allocation are still unclear. We need to collect more metrics to support this issue, and reconsider its priority then.

