Utilize Nokogiri streaming capabilities for CoverageReportWorker (!79866) · Merge requests · GitLab.org / GitLab

Nikola Milojevic requested to merge 351921-optimize-coverage-report-worker into master Feb 03, 2022

What does this MR do and why?

Instead of loading the whole XML file into the memory, we are using Nokogiri::XML::SAX::Document to process each line as soon as it was read. This way we could extract those nodes one-at-a-time with the reader. This should reduce memory consumption of this worker in case of really big XML files.

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

Dowonload and extract cobertura-coverage-2.xml.gz

In rails console execute:

  coverage_report = Gitlab::Ci::Reports::CoverageReports.new
  Gitlab::Ci::Parsers::Coverage::Cobertura.new.parse!(File.open("../cobertura-coverage-2.xml"), coverage_report)

Results

Tested locally with cobertura-coverage-2.xml.gz

I used Gitlab::Utils::Measuring to measure time and memory usage

  ::Gitlab::Utils::Measuring.new.with_measuring do
    rss_start = `ps -o rss= -p #{$$}`.to_f/1024
    # parse xml file
    rss_after = `ps -o rss= -p #{$$}`.to_f/1024
    rss_diff = rss_after - rss_start
  end

	DOM parser	Nokogiri::XML::Reader	Nokogiri::XML::SAX
rss_after	1172 Mb	584MB Mb	460 Mb
rss_diff	722Mb MB	134 MB	4.2MB
execution time	6.58 sec	5.45 sec	2.54 sec

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Related to #351921 (closed)

Edited Feb 18, 2022 by Nikola Milojevic

Utilize Nokogiri streaming capabilities for CoverageReportWorker

What does this MR do and why?

Screenshots or screen recordings

How to set up and validate locally

Results

MR acceptance checklist

Merge request reports