Skip to content

Lessen Gitaly N+1 errors in markdown processing

What does this MR do?

When rendering a page from raw markdown, we process each chunk of markdown -- a comment, MR description, etc -- separately, passing through a Banzai::Pipeline, which defines a series of Banzai::Filter classes which will process the markdown in the defined order.

In CommitReferenceFilter, when our regex identifies a string of characters that might be a commit SHA, we go do a Gitaly lookup. This is usually more or less ok, but can spiral into a large number of calls for some chunks. One pathological example is a comment that includes timing data for dozens of method calls, where the timing data is long strings of numbers... which the regex identifies as possible SHAs, and attempts to look them up one by one...

In this MR I'm modifying batch processing code from IssuableReferenceFilter to create a sort of local cached hash of Commit objects, silo'd by project. This reuses a proven technique, and allowing for a single Gitaly request for the entire page request, rather than my WIP code which batched commit references by document node (or chunk) which would've reduced Gitaly calls, but not as drastically, and still would've been reinventing the wheel.

Does this MR meet the acceptance criteria?

Conformity

Performance and testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-has ] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team

Closes #60449 (closed)

Edited by Kerri Miller

Merge request reports