Get ElasticSearch in shape
ElasticSearch has been a tricky subject over the past months and we need to bring it in shape. Roughly speaking we would need the following (in no particular order):
- Better logging, we should be able to see where the indexing process is, what it's indexing, etc
- Better performance monitoring (as much code should be instrumented as possible)
- Better performing code
- Better infrastructure monitoring (unless this is already taken care of), somewhat similar to the monitoring for PostgreSQL (slow queries, etc)
The 3rd item is the hardest and most time consuming part. The first step towards this process is to note down the current performance problems with the various bits of code, and create issues for this in GitLab EE (since ElasticSearch is EE only). We can then assign these issues to those who'd like to work on it.
@pacoguzman @eReGeBe @maratkalibek @ahmadsherif : In today's call @maratkalibek volunteered to help out on this, but I'd like to have another performance engineer work specifically on the code part. Unless somebody volunteers for this part I'll assign somebody randomly using the power of Ruby.
Major works in 8.17 and 9.0:
- Upgrade to Elasticsearch 5.1: https://gitlab.com/gitlab-org/gitlab-ee/issues/1253 /https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/1059
- Monitor the Elasticsearch cluster: https://gitlab.com/gitlab-com/infrastructure/issues/1292
- Improve the database backfill rake task: https://gitlab.com/gitlab-org/gitlab-ee/issues/1839 / https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/1361
- Move repository indexing from rake to sidekiq: https://gitlab.com/gitlab-org/gitlab-ee/issues/1618 / https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/1144
- Improve repository indexing performance: https://gitlab.com/gitlab-org/gitlab-ee/issues/1606 / https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/1372
- Add HTTPS and AWS elasticsearch cluster support: https://gitlab.com/gitlab-org/gitlab-ee/issues/1181, https://gitlab.com/gitlab-org/gitlab-ee/issues/1373 /https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/1305
- Elasticsearch security fix: https://gitlab.com/gitlab-org/gitlab-ee/issues/1067 / https://dev.gitlab.org/gitlab/gitlab-ee/merge_requests/507
It looks like we're going to need to do some work in 9.1 as well. In particular, I'm worried by the RAM footprint of the elastic_repo_indexer
job, and the amount of time it takes to run (see https://gitlab.com/gitlab-org/gitlab-ee/issues/1606 for more details on that).
There are also a number of features that need implementing to bring ES up to feature parity with the existing search: https://gitlab.com/gitlab-org/gitlab-ee/issues?scope=all&utf8=%E2%9C%93&state=opened&label_name[]=elasticsearch
We still don't know if the new repository backfill job is suitable for GitLab.com. All we can do is run it (cautiously) and see what happens. https://gitlab.com/gitlab-com/infrastructure/issues/1157 . Database backfill should be fine.
Works in 9.1:
- Allow admin to search all projects: https://gitlab.com/gitlab-org/gitlab-ee/issues/1646
- Split the
elasticsearch
sidekiq queue into two: https://gitlab.com/gitlab-org/gitlab-ee/issues/2108 - Fix highlighting in code search results: https://gitlab.com/gitlab-org/gitlab-ee/issues/1567
- Introduction of a Go elasticsearch indexer: https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/1483
- New infrastructure issue: https://gitlab.com/gitlab-com/infrastructure/issues/1597