Improve diff_files endpoint performance
What does this MR do?
On .com, we currently load commit diffs via a special endpoint asynchronously (it's been a long-standing feature toggle, async_commit_diff_files
, which was never rolled out beyond gitlab-org/gitlab
). In #326316 we found that diff rendering is very expensive both in terms of CPU and memory, but this endpoint specifically was doing some unnecessary work. This MR is trying to improve this by:
- Rendering the response as HTML not JSON. This removes the need to escape potentially very large diff HTML strings to stuff them into a JSON envelope.
- Optimize a few unnecessary allocations on frequently visited code paths during diff rendering.
Results
Taking an object profile for the diff_files
endpoint and a large commit with ~2000 diffs across 86 files:
Before
{"mem_objects":2587625,"mem_bytes":210585416,"mem_mallocs":621598}
After
{"mem_objects":2350725,"mem_bytes":173328440,"mem_mallocs":611828}
This saves us ~240k allocations (which also improves GC churn and thus saves CPU cycles) and 37MB of memory per call.
Preview
I've enabled the feature toggle on a review-app. You can look at how the commit is rendered here: https://gitlab-review-326316-imp-qmshtg.gitlab-review.app/gitlab-qa-sandbox-group/qa-test-2021-04-19-12-44-24-d48bb126e58af07f/autodevops-project-47056ff5ba73b673/-/commit/d2cbbaf58dc845048c3bb630f80226000f0d2dd3
Does this MR meet the acceptance criteria?
Conformity
-
📋 Does this MR need a changelog?-
I have included a changelog entry. -
I have not included a changelog entry because _____.
-
-
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Related to #326316