Pipeline JUnit View: performance on running on big data set
While visiting the https://gitlab.com/gitlab-org/gitlab/pipelines/99300747 I noticed very poor ~performance of this page.
It seems that this happens due to:
%span.badge.badge-pill= pipeline.test_reports.total_count
This makes us to parse all reports when trying to render the main page every time.
Secondly, we also do:
- Load each report from Object Storage sequentially as part of main request that presents the page,
- We load all reports, it means that we can have 200-1000 different files to analyze,
- We do parse each report in-memory as part of request processing, we do it twice: 1. to calculate a number of all tests, 2. to show
test_report
, - Since we do not limit upper size of parsed reports, we can read files of number of MBs into memory,
- We load
test_report.json
as a single data-set, without any pagination, which results in elevated Memory pressure on Unicorn, long processing time, and extensive data transfer.
I also noticed that we on the endpoint https://gitlab.com/gitlab-org/gitlab/pipelines/99300747/test_report.json
do generate 1.5MB (for GitLab repository) of gzipped json, and this takes around 14s to render for that pipeline.
Steps to reproduce
The following was written up in #41268.
We followed the instructions here to add test reporting to our GitLab pipeline. Our CI basically looks like this:
test_rspec:
stage: test
retry: 2
parallel: 30
script:
- bundle exec rake "knapsack:rspec[--format RspecJunitFormatter --out rspec.xml]"
artifacts:
paths:
- rspec.xml
reports:
junit: rspec.xml
###What is the current bug behavior?
We have approximately 27000 total tests. Loading the tests tab takes a while, then opening up and viewing the errors is slow.
This is taking 14s to load on the Gitlab project.
What is the expected correct behavior?
The pipeline page loads in 1-2 seconds.
Possible fixes
I believe that we should improve the performance of this feature before enabling it for everyone, by:
- Badge counter should be loaded asynchronously, and cached,
- Limit amount of data being processed,
- If we have a significant amount of data we should consider truncating the results to ensure consistent performance regardless of the size of the data, clearly present that to users that there's too much data to process, (limit should be somewhere around 10 files, and maybe maximum of 10k tests)?
- Ideally the data should be processed in background, and returned to Unicorn to ensure that a time taken by the endpoint is reasonable,(but this can be hard due to data size)
- Maybe it would make sense to build a complete representation, store it as pipeline attached artifact, return this artifact to the frontend (but this can be hard)
We should also clearly document the ~performance aspect of this feature, and have a test that validates the performance of it.
TL;DR I would expect that if we cannot run that in reasonable time for GitLab, we should truncate or show message that there's too much data to process.