Utilize Nokogiri streaming capabilities for CoverageReportWorker
What does this MR do and why?
Instead of loading the whole XML file into the memory, we are using Nokogiri::XML::SAX::Document
to process each line as soon as it was read.
This way we could extract those nodes one-at-a-time with the reader.
This should reduce memory consumption of this worker in case of really big XML files.
Screenshots or screen recordings
These are strongly recommended to assist reviewers and reduce the time to merge your change.
How to set up and validate locally
-
Dowonload and extract cobertura-coverage-2.xml.gz
-
In rails console execute:
coverage_report = Gitlab::Ci::Reports::CoverageReports.new Gitlab::Ci::Parsers::Coverage::Cobertura.new.parse!(File.open("../cobertura-coverage-2.xml"), coverage_report)
Results
Tested locally with cobertura-coverage-2.xml.gz
I used Gitlab::Utils::Measuring
to measure time and memory usage
::Gitlab::Utils::Measuring.new.with_measuring do
rss_start = `ps -o rss= -p #{$$}`.to_f/1024
# parse xml file
rss_after = `ps -o rss= -p #{$$}`.to_f/1024
rss_diff = rss_after - rss_start
end
DOM parser | Nokogiri::XML::Reader | Nokogiri::XML::SAX | |
---|---|---|---|
rss_after | 1172 Mb | 584MB Mb | 460 Mb |
rss_diff | 722Mb MB | 134 MB | 4.2MB |
execution time | 6.58 sec | 5.45 sec | 2.54 sec |
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #351921 (closed)