A low rate of requests to the search api can cause an apdex drop / degraded performance on a Gitaly shard
In https://gitlab.com/gitlab-com/gl-infra/production/-/issues/5031 we saw a file node degradation due to a small set of users (~15) hitting the search api for a single repository, at a combined rate of about 5 per minute. This degradation has a large effect as it caused an apdex drop for an entire Gitaly shard.
These search requests over the public API were resulting in grpc timeouts at rails, so there was no useful information returned but this was resulting in heavy load on the Gitaly server with many short-lived git processes (example)
Ideally we would be throttling these requests better at Gitaly, or some throttling done upstream to prevent these types of requests from impacting performance.
- Is there any optimization we can make at gitaly to make these requests less expensive, especially when the multiple requests are coming in succession?
- Does it make sense that we are seeing grpc timeouts at Rails but requests are succeeding at Gitaly? do we need to adjust timeouts? (see https://gitlab.com/gitlab-com/gl-infra/production/-/issues/5031#note_613929449)
- How exposed are we with the
/searchAPI endpoint if it is this simple to degrade performance by searching a single project?
For now, labeling this as Category:Gitaly until we decide where it should go.