Skip to content

De-duplicate binary lookups for identical commit diff paths

Matthias Käppler requested to merge 326316-de-dup-binary-blob-lookups into master

What does this MR do?

In https://gitlab.com/gitlab-org/gitlab/-/issues/326316 we are looking to improve performance of the commit endpoint, which renders diffs for potentially many files. One offender we found was a function that probes into the git blob data to decide whether the file is binary or not. This function takes about 50% of CPU time spent in diff file rendering. We are already looking to cache these calls in !60128 (merged) and are also considering to push this detection down into Gitaly in #329203, however:

  • the fix in !60128 (merged) will only help when this redis backed cache is warm; initial rendering time for a commit diff is not helped by it
  • extracting this logic from Rails is a larger effort and unclear if or when it will happen

As a short-term mitigation we can try to at least skip unnecessary lookups, which this MR tries to accomplish.

This optimization here reduces the number of times we need to call into this function by leveraging the fact that when computing diffs, we need to fetch all files twice using different revisions. Since the chance that a given file changes its type ("binary-ness") across two revisions is remote at best (though not impossible), we cache this lookup by path in-memory now so that for subsequent gitaly fetches of the same file, we can re-use the previous lookup result.

We need to understand that this MR accepts a trade-off between a slim chance of being incorrect for a large gain in performance. For instance, should a file change its type from e.g. text to PNG or vice versa across two revisions, we will have cached the wrong result. However, this would be immediately fixed with a page reload because the results are only cached per-request (per iteration of a gitaly response even.)

Results

As expected, this cuts call count and CPU time spent in this detection in half, from ~50% to ~25% of all CPU time spent in rendering diffs:

Before

See https://gitlab.com/gitlab-org/gitlab/-/issues/326316#note_558912202

After

Screenshot_from_2021-04-28_11-28-13

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Related to #326316

Edited by Matthias Käppler

Merge request reports