JUnit feature should be able to parse JUnit.xml files with large nodes

Problem

As a developer (Sasha) when a JUnit XML file is created by my pipeline that contains very large system-output nodes (8MB +), I want the JUnit report to parse correctly so that I can see the output of my tests without downloading the file to view outside of GitLab.

Right now when there is one or more very large <system-out> nodes in the JUnit report, nokogiri cannot successfully parse the XML file, and it throws an error to the UI.

Note: This replaces a previous issue that was iterated on several times until getting to this state. That issue was closed to get a fresh count of users who need support for large nodes in JUnit reports.

Proposal

Extremely large nodes that might contain useful output sometimes fail to parse successfully. The attached file in this comment #25357 (comment 275846278) contains a single test run with an extremely large <system-out> node. There may be value for some customers to have large outputs in their tests like this, and surfacing it partially or fully in the UI may add value.

We may also be able to make the scan "on demand" where large nodes are ignored unless a customer specifically requests them. There would still be a performance and potentially reliability concern but it's an option.

The purpose of this issue is to add a setting in the /admin/application_settings/ci_cd page to enable the nokogiri "huge" setting for Junit report parsing.

> Nokogiri::XML.parse(File.open("large.xml")).root.children.first.text.length
 => 9999894 
> puts Nokogiri::XML.parse(File.open("large.xml")).errors
xmlSAX2Characters: huge text node
Extra content at the end of the document

... add the huge option ...

> Nokogiri::XML.parse(File.open("large.xml")) {|c| c.huge}.root.children.first.text.length
 => 14065665
> puts Nokogiri::XML.parse(File.open("large.xml")) {|c| c.huge}.errors
 => nil

Stumbled across this here

Risks may be present in performance/memory usage, we should make note of that around the setting, and it should be defaulted to OFF.

This issue may be much easier to do once if #217514 (closed) is re-opened and completed.

Further Questions

Some questions that come to mind are:

What is the maximum useful node size for <system-out> nodes?
Should we be treating <system-out> nodes in Junit.xml differently than other nodes because they may contain stack traces, etc?
Can we truncate the node and read the tail? Would the head make more sense?
If we do surface truncated data should we put a link to download the full report in the UI?

Edited Jun 14, 2021 by James Heimbuck