Skip to content

Figure out how to speed up initial indexing of groups

We've recently worked on several large rollouts of Elasticsearch indexing for customers. These are still taking many hours to do a few thousand projects. We may want to find out for sure what the bottleneck is and if there is some easy way to speed things up. It's possible we could be going much faster by just increasing sidekiq concurrency but we don't know for sure. Sidekiq is intentionally throttled for Elasticsearch to prevent overloading the cluster but then our cluster is much bigger now that it used to be and should handle much greater concurrency.

Recent additions:

Right now with these taking ~15 hours to do a few hundred groups I'm a little hesitant to do larger batches because it is holding up queues for updates to these records during this period of time. We may be able to solve that my separating queues and there are some issues already to do that but ultimately we still will have projects half indexed for long periods of times and so speeding it up would be preferable if there is an easy way to do that.

Monitoring from last roll-out (2020-04-09 05:22:12 UTC - 2020-04-10 10:56 UTC)

Node monitoring on cluster

Instance 0

Screen_Shot_2020-04-14_at_5.22.44_pm

Screen_Shot_2020-04-14_at_5.23.09_pm

Instance 1

Screen_Shot_2020-04-14_at_5.24.34_pm

Screen_Shot_2020-04-14_at_5.24.45_pm

Instance 2

Screen_Shot_2020-04-14_at_5.25.23_pm

Screen_Shot_2020-04-14_at_5.25.42_pm

Instance 3

Screen_Shot_2020-04-14_at_5.26.10_pm

Screen_Shot_2020-04-14_at_5.26.24_pm

Instance 4

Screen_Shot_2020-04-14_at_5.26.48_pm

Screen_Shot_2020-04-14_at_5.27.00_pm

Instance 5

Screen_Shot_2020-04-14_at_5.27.14_pm

Screen_Shot_2020-04-14_at_5.27.27_pm

Index monitoring on cluster

Screen_Shot_2020-04-14_at_5.21.14_pm

Screen_Shot_2020-04-14_at_5.20.09_pm

Sidekiq detail monitoring

ElasticIndexerWorker

Screen_Shot_2020-04-14_at_5.32.11_pm

Screen_Shot_2020-04-14_at_5.32.35_pm

Screen_Shot_2020-04-14_at_5.32.47_pm

ElasticCommitIndexerWorker

Screen_Shot_2020-04-14_at_5.34.11_pm

Screen_Shot_2020-04-14_at_5.34.39_pm

Screen_Shot_2020-04-14_at_5.34.46_pm

Gitaly Monitoring

Screen_Shot_2020-04-14_at_5.42.45_pm

Screen_Shot_2020-04-14_at_5.43.35_pm

Screen_Shot_2020-04-14_at_5.44.23_pm

Postgres monitoring

Screen_Shot_2020-04-14_at_5.46.01_pm

Screen_Shot_2020-04-14_at_5.46.22_pm

This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.

Edited by 🤖 GitLab Bot 🤖