Improve performance of Branches List with Search API under load into next tier
After being highlighted by a potential customer it's been found that the Branches List API with search parameter specified is unperformant compared to hitting the API without:
█ Results summary
* Environment: 10k
* Environment Version: 13.10.0-pre `0bee278055f`
* Option: 60s_200rps
* Date: 2021-03-18
* Run Time: 4m 16.25s (Start: 16:19:24 UTC, End: 16:23:40 UTC)
* GPT Version: v2.6.1
NAME | RPS | RPS RESULT | TTFB AVG | TTFB P90 | REQ STATUS | RESULT
-------------------------------------------|-------|---------------------|-----------|----------------------|----------------|----------------
api_v4_projects_repository_branches | 200/s | 195.8/s (>160.00/s) | 105.92ms | 122.40ms (<500ms) | 100.00% (>99%) | Passed
api_v4_projects_repository_branches_search | 200/s | 134.2/s (>64.00/s) | 1346.20ms | 1838.55ms (<11000ms) | 100.00% (>99%) | Passed
The testing was done on our test 10k Reference Architecture environment as standard at 200 RPS with the project being tested a copy of gitlabhq (tarball can be found here), which has around 2100 branches.
The search term used in this test was stable
as numerous branches have this term in their name. However testing with other terms returned similar results suggesting that there may be a performance issue here that's not directly related to the term or the number of results it brings.
Looking at the environment metrics it's clear that the Gitaly nodes are taking the brunt here:
When highlighted by the customer the endpoint was performing much worse than above but this was determined to be due to that specific environment being tested at a much higher than expected rate leading to what appeared to be a compounding effect. In our own testing we see something similar on our 50k environment, tested at 1000 RPS, where it's results clearly show a compounding effect where the endpoint has degraded more substantially at higher pressure even against a bigger environment:
█ Results summary
* Environment: 50k
* Environment Version: 13.11.0-pre `25903c136cb`
* Option: 60s_1000rps
* Date: 2021-03-22
* Run Time: 4m 11.76s (Start: 12:21:58 UTC, End: 12:26:10 UTC)
* GPT Version: v2.6.1
NAME | RPS | RPS RESULT | TTFB AVG | TTFB P90 | REQ STATUS | RESULT
-------------------------------------------|--------|----------------------|-----------|----------------------|----------------|----------------
api_v4_projects_repository_branches | 1000/s | 991.12/s (>800.00/s) | 99.29ms | 111.43ms (<500ms) | 100.00% (>99%) | Passed
api_v4_projects_repository_branches_search | 1000/s | 284.71/s (>240.00/s) | 3229.44ms | 4815.56ms (<11000ms) | 100.00% (>99%) | Passed
Performance targets though are measured against the 10k environment for consistency so this issue falls into the severity3 performance targets tier with the hope that any improvements will filter up to the larger environments and reduce the compounding effect.
Of note compared to the previous issue the endpoint is showing a heavier strain on the GitLab Rails nodes (CPU specifically)
Task is improve performance further into the severity4 tier (<1000ms).