Skip to content

Set 1s server side timeout on Elasticsearch counts

What does this MR do?

These count requests are loaded one per tab every time the search page loads. This means a single search for one type of document will trigger up to 7 other searches just to get the counts for the other tabs.

These tab counts are often incredibly expensive requests too especially relative to the cheaper searches. For example an issue search may take 1s while a blobs count will take 30s. Due to a limited thread pool on the Elasticsearch side we regularly see these count queries being the cause of queuing which is slowing down otherwise fast searches on GitLab.com.

As such we want to set a timeout on these. This timeout is just a server side Elasticsearch timeout for now which is a soft limit because Elasticsearch is asynchronous and it may actually take Elasticsearch longer to realise it's timed out and cancel the query. As such we may see searches take a few seconds before they timeout even though the timeout is 1s. This is not perfect but benchmarking in the related issue shows this still can drastically improve throughput and this is one of the easiest steps to take now.

One thing to also note about this approach is that users will still see a count in the event of a timeout. The count may be a partial count and actually lower than the true count. If they switch to the tab they will see a true count. I think this is probably still better than displaying nothing since the main value the tab counts have is showing whether or not there are searches on that tab at all.

Later we may wish to introduce client side timeouts on our ES client but it's trickier to accomplish since we use a single client configuration which has a global timeout for all Elasticsearch queries. Additionally client side timeouts will result in errors that we may wish to handle specially to show some indicator on the tab.

Read more at #301146 (closed)

Screenshots (strongly suggested)

This is what tab counts look like in GitLab but this MR won't change anything except for the numbers may be sometimes partial counts that are lower than the true value:

Screen_Shot_2021-02-05_at_4.19.59_pm

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #301146 (closed)

Edited by Dylan Griffith

Merge request reports