Improve performance of logs_tree action for Projects::RefsController under load into next tier
Related gitlab-org/quality/performance#224 (closed)
A user on HN, ddevault, reported that the project files viewing experience was slow. After a quick investigation this was verified. Our testing was missing out on a specific controller and action that is called on the file trees - Projects::RefsController#log_trees.json
that we immediately sought to add.
Once the test was updated to include the controller and point at a larger directory (~1800 files) the testing showed that the controller is indeed performing notably under our target of 500ms.
█ Results summary
* Environment: 10k
* Environment Version: 12.10.0-pre `71c834efe31`
* Option: 60s_200rps
* Date: 2020-04-16
* Run Time: 2m 8.45s (Start: 14:13:32 UTC, End: 14:15:40 UTC)
* GPT Version: v1.2.6
NAME | RPS | RPS RESULT | TTFB AVG | TTFB P90 | REQ STATUS | RESULT
--------------------------------|-------|----------------------|-----------|---------------------|----------------|-----------------
api_v4_projects_repository_tree | 200/s | 195.57/s (>160.00/s) | 78.69ms | 93.45ms (<500ms) | 100.00% (>95%) | Passed
web_project_files | 20/s | 9.05/s (>16.00/s) | 1898.75ms | 2674.71ms (<2000ms) | 100.00% (>95%) | FAILED³
Detailed Stats:
█ Web - Project Files Tree
data_received..............: 11 MB 190 kB/s
data_sent..................: 176 kB 2.9 kB/s
group_duration.............: avg=12175.10ms min=9955.84ms med=11863.04ms max=15312.28ms p(90)=14206.22ms p(95)=14596.49ms
http_req_blocked...........: avg=0.05ms min=0.00ms med=0.01ms max=5.57ms p(90)=0.01ms p(95)=0.04ms
http_req_connecting........: avg=0.03ms min=0.00ms med=0.00ms max=5.43ms p(90)=0.00ms p(95)=0.00ms
http_req_duration..........: avg=1899.49ms min=218.51ms med=2114.44ms max=2963.52ms p(90)=2675.27ms p(95)=2836.52ms
http_req_receiving.........: avg=0.70ms min=0.35ms med=0.49ms max=66.75ms p(90)=0.68ms p(95)=0.93ms
http_req_sending...........: avg=0.04ms min=0.02ms med=0.04ms max=0.14ms p(90)=0.06ms p(95)=0.08ms
http_req_tls_handshaking...: avg=0.00ms min=0.00ms med=0.00ms max=0.00ms p(90)=0.00ms p(95)=0.00ms
http_req_waiting...........: avg=1898.75ms min=217.88ms med=2110.42ms max=2963.04ms p(90)=2674.71ms p(95)=2836.00ms
✗ { endpoint:logs_tree }...: avg=2266.45ms min=1854.54ms med=2216.62ms max=2963.04ms p(90)=2739.00ms p(95)=2860.86ms
✓ { endpoint:tree }........: avg=269.84ms min=217.88ms med=256.50ms max=366.94ms p(90)=336.81ms p(95)=349.85ms
✗ http_reqs..................: 543 9.049988/s
✗ { endpoint:logs_tree }...: 443 7.383323/s
✗ { endpoint:tree }........: 100 1.666664/s
iteration_duration.........: avg=12024.83ms min=0.31ms med=11839.29ms max=15312.30ms p(90)=14198.41ms p(95)=14587.16ms
iterations.................: 80 1.333332/s
✓ successful_requests........: 100.00% ✓ 543 ✗ 0
vus........................: 1 min=1 max=20
vus_max....................: 20 min=20 max=20
This was tested on our 10k Reference Architecture with a RPS target of 20/s. It targets a path on one of two copies of a sanitised version of out gitlab-foss
project that has a high number of files (app/assets/images/emoji
- 1794 files). The equivalent API test is shown for context but it's performing within expectations as it's paginating.
Of note for this particular endpoint is that it dynamically loads in more logs as the user scrolls the page if there's many files. When this occurs the performance hit happens to the user essentially on each page scroll, degrading the experience further (you can see this live in effect here - https://gitlab.com/gitlab-org/gitlab/tree/master/app/assets/images/emoji).
As shown above the logs_tree
action is performing badly here with a TTFB P90 of around 2.7s. As per our performance targets this issue falls into the ~s2 tier.