SPIKE: Find a better Elasticsearch routing strategy to help speed up group level search
Problem Statement
GitLab Advanced Search provides interfaces that allow users to search within a project, within a group, and on the whole GitLab instance. Comparing to project level search, it's much slower when searching within a group. Thinking about the potential cause, it could be related to how we handle routing. Project id is used when building the routing option. So, the index data for a given project is most likely on one shard in Elasticsearch cluster, while projects under a group may be distributed on different shards. In the case of querying with a group, it seems logical that it would take a longer time to aggregate data coming from multiple shards.
Proposal
We may want to find a way to put data that belongs to the same group into one shard or small number of shards in the hope to reduce the latency that's caused by assembling query results from many shards. However, we also want to make sure we don't put too much data on a single shard which could cause performance issues.
Additional Information
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.