ElasticSearch indexing on GitLab.com for 9.0
Meta-issue: https://gitlab.com/gitlab-org/gitlab-ce/issues/27084
For GitLab 8.17, I'd like to enable ElasticSearch indexing only - primarily to gather data about how it performs, spot any bottlenecks, and learn about things we can do to make it usable on GitLab.com for 9.0
The main change in GitLab 8.17 is altering the "backfill" indexing of project repositories to proceed inside sidekiq, and having indexing performed on git push
cause that backfill to be skipped as unnecessary: https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/1144
I propose that we simply enable indexing, and gather data on the rate at which projects become indexed, whether this causes us any performance issues on GitLab.com, etc.
If it goes well, we can determine suitable sidekiq throttling limits and run the gitlab:elastic:index_repositories_async
rake task, again monitoring the performance impact.
I'd also like to run the database indexing jobs, to check that these are sensible at GitLab.com's scale - if not, they'll something like the index_repositories_async
backfill method.
GitLab 9.0 will be upgrading to ElasticSearch 5.1, so we'll have to rebuild the index in any case, so this is strictly a data-gathering opportunity. I don't propose enabling searching via ElasticSearch during the 8.17 lifecycle.
Blockers:
- Cluster is currently ES5.2, but 8.17 uses... 2.4? https://gitlab.com/gitlab-com/infrastructure/issues/1130
- Monitoring dashboard for ElasticSearch https://gitlab.com/gitlab-org/gitlab-ee/issues/1617
If given the necessary credentials, I'm perfectly happy to take on this work myself.
What do you need from me to make this happen @maratkalibek ? Any thoughts on the wisdom (or otherwise) of this approach?