Skip to content

Advanced Search: Improve indexing throughput

Background

We can't process updates fast enough on the application side, while our Elasticsearch cluster is capable of processing a lot more data.

A spike in indexing can cause our non-code indexing queue to grow considerably, and it would be a while until we can drain it. It means users have to wait quite some time until their updates end up in Advanced Search.

I see multiple causes for that:

  • We're indexing more and more data, which means there are more regular updates, but our indexing throughput has stayed the same
  • We've added new document types to Advanced Search (users), and plan to add others

Our non-code indexing throughput was mostly constant. When we first introduced Elastic::ProcessBookkeepingService I believe the limit was 10,000 documents per iteration. After we introduced shards, we can index 16,000 documents per one execution. This is our hard limit. Currently, we're seeing that to even process 16,000 documents we spend 5-10 minutes, which is not sustainable with the current indexing demand.

Solution

The suggestion of this issue to process each shard in parallel, so it should decrease the time we spend to process 16,000 documents by a factor of 16. If we end up in the same situation later and need to increase throughput, we can bump up the number of shards.

Currently, we use 16 sharded zsets to track all the updates for Advanced Search integration. One worker for initial queue ElasticIndexInitialBulkCronWorker and one for incremental ElasticIndexBulkCronWorker.

I think we should:

  • Parallelize the work by scheduling a worker for each shard. For instance, ElasticIndexInitialBulkCronWorker should schedule 16 workers with the shard number as its argument.
    • Add this new argument to the locking mechanism in Elastic::BulkCronWorker
    • Change Elastic::ProcessBookkeepingService to support passing a shard number into the execute method.
  • Schedule worker with the shard number passed to it right away if we still have data to process (this has been done already)

- Bump up the SHARD_LIMIT and potentially add it to the admin UI

Edited by silv