Skip to content

Elasticsearch performance improvement, use routing when possible

Dmitry Gruzd requested to merge search-with-elasticsearch-routing into master

What does this MR do?

I've noticed that we don't use our routing field when querying results and for each search request we hit every available shard.

For a test ES index with 100 shards, when we search issues in a project it looks like that:

{"took":5,"timed_out":false,"_shards":{"total":100,"successful":100,"skipped":0,"failed":0},..

But all of the project-related data is stored in 1 shard (because our routing is based on the project_id). So we are doing 100 times more work to proceed a query. It will be much more important as our cluster starts to grow.

After enabling Feature.enable(:elasticsearch_use_routing) it looks different, we hit only one shard:

{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},..

That's 5x improvement (5ms -> 1ms) on a local machine, it could be even better on our cluster.

The change is enabled with a feature flag, so we could check the difference in CPU utilization and average response times after enabling.

I believe it's necessary to achieve our ES scaling goals and it will help to drop our search response times.

#193172 (closed)

Screenshots

Screenshot_2019-12-25_at_18.48.35

Related documentation

Searching with custom routing

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by 🤖 GitLab Bot 🤖

Merge request reports