Merge Request Discussions API degrades with comments count

With our increased performance testing efforts we've started to identify slow areas in GitLab and raising them as issues. One such area is the Merge Request Discussions API where we've found its time to respond slows even if the number of discussions is not quite so big(150+). Other API endpoints in the same area don't show the same degradation:

* Environment:                10k
* Environment Version:        12.9.0-pre `8705340a877`
* Option:                     60s_200rps
* Date:                       2020-02-25
* GPT Version:                v1.2

NAME                                                     | RPS   | RPS RESULT           | TTFB AVG  | TTFB P90             | REQ STATUS     | RESULT 
---------------------------------------------------------|-------|----------------------|-----------|----------------------|----------------|--------
api_v4_projects_merge_requests_merge_request_changes     | 200/s | 195.8/s (>160.00/s)  | 63.97ms   | 68.11ms (<500ms)     | 100.00% (>95%) | Passed 
api_v4_projects_merge_requests_merge_request_commits     | 200/s | 195.68/s (>160.00/s) | 49.33ms   | 52.12ms (<500ms)     | 100.00% (>95%) | Passed 
api_v4_projects_merge_requests_merge_request_discussions | 200/s | 82.85/s (>32.00/s)   | 2169.35ms | 3444.71ms (<5000ms)  | 100.00% (>95%) | Passed¹

Also, looking through Kibana logs we can see requests that are taking >6 sec(and two requests with >16 sec) to complete.

The endpoint also causes heavy CPU usage on the rails nodes:

This is only 90% as the servers run Puma with workers set to 90% of the CPU. On Unicorn this would be 100%.

Steps to reproduce

Check out the Performance Toolkit
Run the specific test with the run-k6 command. For example against the 10k environment you would run this following from the project root: ACCESS_TOKEN="" ./run-k6 -e environments/10k.json -s scenarios/60s_200rps.json -t tests/api/api_v4_projects_merge_requests_merge_request_discussions.js, where ACCESS_TOKEN is a valid GitLab Personal Access Token for the specified environment(10k in this case). The token should come from a User that has admin access for the project(s) to be tested and have API and read_repository permissions.
If you're seeking to run the test against your own environment the Toolkit's documentation has details on how to achieve this.

What is the current bug behavior?

The API will slow depending on the number of comments in the MR.

What is the expected correct behavior?

The API responds quickly like the other MR endpoints since the API only returns 20 results via pagination.

Edited Feb 25, 2020 by Grant Young