Figure out how to speed up initial indexing of groups
We've recently worked on several large rollouts of Elasticsearch indexing for customers. These are still taking many hours to do a few thousand projects. We may want to find out for sure what the bottleneck is and if there is some easy way to speed things up. It's possible we could be going much faster by just increasing sidekiq concurrency but we don't know for sure. Sidekiq is intentionally throttled for Elasticsearch to prevent overloading the cluster but then our cluster is much bigger now that it used to be and should handle much greater concurrency.
Recent additions:
Right now with these taking ~15 hours to do a few hundred groups I'm a little hesitant to do larger batches because it is holding up queues for updates to these records during this period of time. We may be able to solve that my separating queues and there are some issues already to do that but ultimately we still will have projects half indexed for long periods of times and so speeding it up would be preferable if there is an easy way to do that.
2020-04-09 05:22:12 UTC
- 2020-04-10 10:56 UTC
)
Monitoring from last roll-out (Node monitoring on cluster
Instance 0
Instance 1
Instance 2
Instance 3
Instance 4
Instance 5
Index monitoring on cluster
Sidekiq detail monitoring
ElasticIndexerWorker
ElasticCommitIndexerWorker
Gitaly Monitoring
Postgres monitoring
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.