Skip to content

Improve performance of Banzai::Filter::ReferenceFilter

Nikola Milojevic requested to merge 217580-improve-performance-of-blob into master

Problem

Screenshot_at_Jun_25_11-37-53

From the flame graph we can see that it takes 38.2s to load, and Banzai::Filter::ReferenceFilter#each_node looked like the biggest offender (7.9-11.7s <-> 25-30% of time) and a good candidate to investigate.

It turned out that Banzai::Filter::ReferenceFilter#each_node is creating and iterating 22346 Nokogiri::XML::Text objects for this CHANGELOG.md file.

I discovered that there were a lot of empty lines, so with skipping empty lines, we reduced the number to 11098 for this particular file.

It turned out that we have various Banzai filters, such as:

  • Banzai::Filter::IterationReferenceFilter
  • Banzai::Filter::UserReferenceFilter
  • Banzai::Filter::ProjectReferenceFilter
  • Banzai::Filter::IssueReferenceFilter
  • Banzai::Filter::MergeRequestReferenceFilter
  • Banzai::Filter::SnippetReferenceFilter
  • Banzai::Filter::CommitRangeReferenceFilter
  • Banzai::Filter::CommitReferenceFilter

We are parsing the same doc and filtering the same nodes over and over again (most of the time).

If these filters don't find any references and they pass to the next filter unchanged doc, that we are filtering again. In case when they actually replace the text with an actual link, doc changes, and those nodes should reflect those changed nodes as well.

Example:

Let say that we have the following text:

 !122 #12323 @root .`

This text will be parsed to

[<Nokogiri::XML::Text "!122 #1 @root .">]

ReferenceMergeRequestFilter

will parse the doc using each_node and iterate through document searching for project_reference_pattern (!merge_request_id), and it will replace this node with 2 new nodes:

[<Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../merge_requests/122">, ...>,

<Nokogiri::XML::Text " #12323 @root .">]

ReferenceIssuesFilter

will parse the doc using each_node again and iterate through the document searching for issue_reference_pattern (#issue_id), and it will replace the new node:

[<Nokogiri::XML::Text " #12323 @root .">]

with:

[<Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../issues/12323">, ...>,

<Nokogiri::XML::Text " @nmilojevic1 .">]

And the actual document will now look like this:

[ <Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../merge_requests/122">, ...>, 

<Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../issues/12323">, ...>,

<Nokogiri::XML::Text " @nmilojevic1 .">]

ReferenceUserFilter

will parse the doc using each_node again and iterate through the document searching for user reference pattern (@user_name), and it will replace the new node:

[<Nokogiri::XML::Text " @nmilojevic1 .">]

with:

[<Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../users/nmilojevic1">, ...>,

<Nokogiri::XML::Text " .">]

And the actual document will now look like this:

[ <Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../merge_requests/122">, ...>, 

<Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../issues/12323">, ...>,

<Nokogiri::XML::Element name="a" attributes=[<Nokogiri::XML::Attr name="href" value="../users/nmilojevic1">, 

<Nokogiri::XML::Text " .">]

When we parse the whole (big) doc each time using each_node for each reference_filter, it searches for text and unreferenced links through 22.000 lines, each time.

What does this MR tries to do?

When we replace the node, we update the list with replaced node (keeping text and omitting referenced links), and we pass that updated node list to the next filter.

Feature Flags

This optimization is under the feature flag:

  • :update_nodes_for_banzai_reference_filter (with project scope)

Rollout plan

  • We should enable Feature Flage for a few projects and test to make sure it works.
  • Enable FF for Gitlab project
  • Prepare MR to enable if for default for everyone
  • Update documentation
  • Prepare MR to remove feature flag

Conformity

Availability and Testing

Tested with Gitlabhq project, CHANGELOG.md file

https://staging.gitlab.com/qa-perf-testing/gitlabhq/-/preview/master/CHANGELOG.md

Screenshot_at_Jun_25_11-37-53 Screenshot_at_Jun_29_10-41-11

master branch

Screenshot_at_Jun_16_17-49-56

217580-improve-performance-of-blob branch

Screenshot_at_Jun_26_16-52-10

branch method time total time
master Banzai::Filter::ReferenceFilter#each_node 7.9s 34.7s
origin/217580-improve-performance-of-blob Banzai::Filter::ReferenceFilter#each_node 0.574s 19.82s

In this particular case, this File is loading 75% faster than before.

Banzai::Filter::ReferenceFilter#each_node is 13x faster then before

I tested File preview as well:

branch total time s Banzai::Filter::ReferenceFilter#each_node
master 27.6s 6.00s (22%)
217580-improve-performance-of-blob 15.53s 0.370s (2.4%)

We can see similar improvements as we have with blob controller.

The issue preview is affected as well.

Related to #217580 (closed)

Edited by Nikola Milojevic

Merge request reports