Implement SAX::Parser for cobertura parser
Problem
Parsing big (> 300MB) XML cobertura files could blow up memory.
We already discussed about this potential ~performance improvement last year, and today might be a good time to reevaluate our strategy.
Proposal
From my current understanding, our current implementation is parsing the XML file at once.
By leveraging a SAX::PARSER we could iterate through the XML instead of loading everything as once leading to ~performance improvement.
We could also implement this solution for our Junit parser in a next iteration.
Below some basic code that we could take inspiration:
require 'nokogiri'
require 'memory_profiler'
class MyDocument < Nokogiri::XML::SAX::Document
def end_document
puts "the document has ended"
end
def start_element name, attributes = []
# puts "#{name} started"
end
end
parser = Nokogiri::XML::SAX::Parser.new(MyDocument.new)
report = MemoryProfiler.report do
parser.parse(File.open('cobertura-coverage.xml'))
end
report.pretty_print