Skip to content

Improve performance of Get the diff of a commit API under load into next tier

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

The TTFB (Time to First Byte) of the Get the diff of a commit API under load is over performance target under load on 1k and 2k environments:

2k - https://gitlab.com/gitlab-org/quality/performance/-/wikis/Benchmarks/Latest/2k:

* Environment:                2k
* Environment Version:        16.9.0-pre `79a1c391322`
* Option:                     60s_40rps
* Date:                       2024-02-12
* Run Time:                   1h 37m 54.66s (Start: 04:47:59 UTC, End: 06:25:53 UTC)
* GPT Version:                v2.13.0

NAME                                                     | RPS  | RPS RESULT         | TTFB AVG   | TTFB P90              | REQ STATUS     | RESULT  
---------------------------------------------------------|------|--------------------|------------|-----------------------|----------------|---------
api_v4_projects_repository_commits_commit_diff           | 40/s | 36.84/s (>32.00/s) | 924.40ms   | 1122.07ms (<850ms)    | 100.00% (>99%) | FAILED¹²

During the test run, CPU usage for Gitaly node spikes to 100%:

Screenshot_2024-02-16_at_14.05.02

* Environment:                1k
* Environment Version:        16.10.0-pre `ea71b210ca7`
* Option:                     60s_20rps
* Date:                       2024-02-16
* Run Time:                   1h 26m 38.29s (Start: 04:41:47 UTC, End: 06:08:26 UTC)
* GPT Version:                v2.14.0

NAME                                                     | RPS  | RPS RESULT         | TTFB AVG  | TTFB P90              | REQ STATUS     | RESULT  
---------------------------------------------------------|------|--------------------|-----------|-----------------------|----------------|---------
api_v4_projects_repository_commits_commit_diff           | 20/s | 16.4/s (>16.00/s)  | 983.08ms  | 1294.98ms (<850ms)    | 100.00% (>99%) | FAILED¹²

On 10k environment the results are better:

* Environment:                10k
* Environment Version:        16.10.0-pre `ea71b210ca7`
* Option:                     60s_200rps
* Date:                       2024-02-16
* Run Time:                   1h 30m 12.09s (Start: 05:04:59 UTC, End: 06:35:11 UTC)
* GPT Version:                v2.14.0

NAME                                                     | RPS   | RPS RESULT           | TTFB AVG  | TTFB P90              | REQ STATUS     | RESULT 
---------------------------------------------------------|-------|----------------------|-----------|-----------------------|----------------|--------
api_v4_projects_repository_commits_commit_diff           | 200/s | 196.78/s (>160.00/s) | 207.60ms  | 227.42ms (<850ms)     | 100.00% (>99%) | Passed¹

https://10k.testbed.gitlab.net/-/grafana/d/J0ysCtCWz/gpt-test-results?orgId=1&var-test_name=api_v4_projects_repository_commits_commit_diff&from=now-90d&to=now

Description before 2024-02-16

The endpoint was performing regularly with a TTFB P90 under our main target of 200ms (around 170ms) but lately it’s crept up to around ~190ms and starting to fail on some environments outright

Screenshot_2023-07-12_at_13.23.20

Screenshot_2023-07-12_at_13.44.59

Example commit page that's used in the test - https://staging.gitlab.com/api/v4/projects/gpt%2Flarge_projects%2Fgitlabhq1/repository/commits/8f9beefa/diff

Corresponding web page for API test - https://staging.gitlab.com/gpt/large_projects/gitlabhq1/-/commit/8f9beefa - sometimes has long running Gitaly responses

Click to expand

Screenshot_2023-07-10_at_16.28.24

Quoting @robotmay_gitlab from internal Slack discussion with initial review:

Looks like Gitaly loads all the feature flags into memory in one go so if something's clearing that cache out it could cause those odd spikes on random requests

Test Details

Testing was done on our 10k Reference Architecture environment with our lab condition GitLab Performance Tool pipeline. The project being tested is a copy of gitlabhq (tarball can be found here). GitLab Performance Tool tests information is listed at Current test details page.

The latest GitLab Performance pipeline results can always be found here. Through this page full Server Metrics can be found via the Metrics Dashboard link on that page.

As per our performance targets this endpoint's TTFB metric is above the target of 1000 ms which is severity3. Task is to improve the endpoint's performance into next tier.

Edited by 🤖 GitLab Bot 🤖