JUnit data performance improvements (#2854) · Epics · GitLab.org

JUnit data performance improvements

## Overview This epic started as an issue after release of the [JUnit Report tab on the pipeline page](https://docs.gitlab.com/ee/ci/junit_test_reports.html#viewing-junit-test-reports-on-gitlab). Since then a number of issues have been spawned and approaches taken to improve performance of parsing the JUnit data. This epic is a collection of those items to iterate towards the goal of loading the pipeline page for the gitlab.org project ~~in 2 - 3 seconds~~ as fast with the JUnit report feature flag enabled and disabled. Once that threshold is met the feature flag will be removed and this feature enabled by default. Large junit reports will still cause the loading of the tab once clicked to be slow. Future enhancements for the loading time of the report itself will be handled in a separate epic. ## Problem While visiting the https://gitlab.com/gitlab-org/gitlab/pipelines/99300747 I noticed very poor gitlab~2731248 of this page. It seems that this happens due to: 1. https://gitlab.com/gitlab-org/gitlab/commit/e524b752982d27bf63a74bbd3b251b09a1647bf4#note_251930379 > `%span.badge.badge-pill= pipeline.test_reports.total_count` This makes us to parse all reports when trying to render the main page every time. Secondly, we also do: 1. Load each report from Object Storage sequentially as part of main request that presents the page, 1. We load all reports, it means that we can have 200-1000 different files to analyze, 1. We do parse each report in-memory as part of request processing, we do it twice: 1. to calculate a number of all tests, 2. to show `test_report`, 1. Since we do not limit upper size of parsed reports, we can read files of number of MBs into memory, 1. We load `test_report.json` as a single data-set, without any pagination, which results in elevated Memory pressure on Unicorn, long processing time, and extensive data transfer. I also noticed that we on the endpoint `https://gitlab.com/gitlab-org/gitlab/pipelines/99300747/test_report.json` do generate 1.5MB (for GitLab repository) of gzipped json, and this takes around 14s to render for that pipeline. ### Steps to reproduce The following was written up in #41268. We followed the instructions [here](https://docs.gitlab.com/ee/ci/junit_test_reports.html) to add test reporting to our GitLab pipeline. Our CI basically looks like this: ``` test_rspec: stage: test retry: 2 parallel: 30 script: - bundle exec rake "knapsack:rspec[--format RspecJunitFormatter --out rspec.xml]" artifacts: paths: - rspec.xml reports: junit: rspec.xml ``` ###What is the current *bug* behavior? We have approximately 27000 total tests. Loading the tests tab takes a while, then opening up and viewing the errors is slow. This is taking 14s to load on the Gitlab project. ### What is the expected *correct* behavior? ~~The pipeline page loads in 1-2 seconds.~~ As fast with the feature flag on as off. ### Possible fixes I believe that we should improve the performance of this feature before enabling it for everyone, by: 1. ~~Badge counter should be loaded asynchronously, and cached,~~ 2. Limit amount of data being processed, 3. If we have a significant amount of data we should consider truncating the results to ensure consistent performance regardless of the size of the data, clearly present that to users that there's too much data to process, (limit should be somewhere around 10 files, and maybe maximum of 10k tests)? 3. Ideally the data should be processed in background, and returned to Unicorn to ensure that a time taken by the endpoint is reasonable,(but this can be hard due to data size) 4. Maybe it would make sense to build a complete representation, store it as pipeline attached artifact, return this artifact to the frontend (but this can be hard) There are some additional ideas here: https://gitlab.com/gitlab-org/gitlab/-/issues/212368#note_325731908 We should also clearly document the gitlab~2731248 aspect of this feature, and have a test that validates the performance of it. TL;DR I would expect that if we cannot run that in reasonable time for GitLab, we should truncate or show message that there's too much data to process.

epic