Skip to content

Utilize Nokogiri streaming capabilities for CoverageReportWorker

Nikola Milojevic requested to merge 351921-optimize-coverage-report-worker into master

What does this MR do and why?

Instead of loading the whole XML file into the memory, we are using Nokogiri::XML::SAX::Document to process each line as soon as it was read. This way we could extract those nodes one-at-a-time with the reader. This should reduce memory consumption of this worker in case of really big XML files.

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

  • Dowonload and extract cobertura-coverage-2.xml.gz

  • In rails console execute:

      coverage_report = Gitlab::Ci::Reports::CoverageReports.new
      Gitlab::Ci::Parsers::Coverage::Cobertura.new.parse!(File.open("../cobertura-coverage-2.xml"), coverage_report)

Results

Tested locally with cobertura-coverage-2.xml.gz

I used Gitlab::Utils::Measuring to measure time and memory usage

  ::Gitlab::Utils::Measuring.new.with_measuring do
    rss_start = `ps -o rss= -p #{$$}`.to_f/1024
    # parse xml file
    rss_after = `ps -o rss= -p #{$$}`.to_f/1024
    rss_diff = rss_after - rss_start
  end
DOM parser Nokogiri::XML::Reader Nokogiri::XML::SAX
rss_after 1172 Mb 584MB Mb 460 Mb
rss_diff 722Mb MB 134 MB 4.2MB
execution time 6.58 sec 5.45 sec 2.54 sec

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #351921 (closed)

Edited by Nikola Milojevic

Merge request reports