JUnit feature should be able to parse JUnit.xml files with large nodes

Problem

As a developer (Sasha) when a JUnit XML file is created by my pipeline that contains very large system-output nodes (8MB +), I want the JUnit report to parse correctly so that I can see the output of my tests without downloading the file to view outside of GitLab.

Right now when there is one or more very large <system-out> nodes in the JUnit report, nokogiri cannot successfully parse the XML file, and it throws an error to the UI.

Proposal

Extremely large nodes that might contain useful output sometimes fail to parse successfully. The attached file in this comment #25357 (comment 275846278) contains a single test run with an extremely large <system-out> node. There may be value for some customers to have large outputs in their tests like this, and surfacing it partially or fully in the UI may add value.

The purpose of this issue is to add a setting in the /admin/application_settings/ci_cd page to enable the nokogiri "huge" setting for Junit report parsing.

> Nokogiri::XML.parse(File.open("large.xml")).root.children.first.text.length
 => 9999894 
> puts Nokogiri::XML.parse(File.open("large.xml")).errors
xmlSAX2Characters: huge text node
Extra content at the end of the document

... add the huge option ...

> Nokogiri::XML.parse(File.open("large.xml")) {|c| c.huge}.root.children.first.text.length
 => 14065665
> puts Nokogiri::XML.parse(File.open("large.xml")) {|c| c.huge}.errors
 => nil

Stumbled across this here

Risks may be present in performance/memory usage, we should make note of that around the setting, and it should be defaulted to OFF.

Further Questions

Some questions that come to mind are:

What is the maximum useful node size for <system-out> nodes?
Should we be treating <system-out> nodes in Junit.xml differently than other nodes because they may contain stack traces, etc?
Can we truncate the node and read the tail? Would the head make more sense?
If we do surface truncated data should we put a link to download the full report in the UI?

Original Issue Content

Summary

I'm running a CI/CD pipeline with a JUnit step, and I'd expect to have something like this image.

This is what I'm seeing, even if my test actually failed.

Steps to reproduce

Setup this stage in your pipeline:

feature_integration_tests:
  stage: unit_tests
  <<: *base_image
  only:
    - /^feature\/*/
  script:
    - mvn $MAVEN_CLI_OPTS verify
  artifacts:
    reports:
      junit:
        - target/surefire-reports/TEST-*.xml

The file in question is > 9.75MB and files smaller seem to be fine.

See example file in comment below.

Example Project

I'm on a private instance, can't share anything.

What is the current bug behavior?

With a failed test I'm still seeing a green checkmark, and I'm not seeing test's names but only the stage that launched them.

Image

What is the expected correct behavior?

I'd expect to see something like this image, with my test name and status for each executed unit test.

Relevant logs and/or screenshots

Uploading artifacts...

target/surefire-reports/TEST-*.xml: found 1 matching files

Uploading artifacts to coordinator... ok id=4253 responseStatus=201 Created token=zGhSmVnj

ERROR: Job failed: exit code 1

Output of checks

(If you are reporting a bug on GitLab.com, write: This bug happens on GitLab.com)

Results of GitLab environment info

I'm not an admin on this instance so I can't provide more informations on that.

Results of GitLab application Check

I'm not an admin on this instance so I can't provide more informations on that.

Possible fixes / Plan to fix

Let's log the error including the size of the file

Edited Aug 19, 2020 by Ricky Wiens