elasticsearch: Index wikis using the external indexer
Problem to solve
Currently, gitlab-ee unconditionally indexes wiki repository blobs only, using code from the Ruby indexer. Commits don't get indexed. However, the ruby indexer implementation is slated for removal: https://gitlab.com/gitlab-org/gitlab-ee/issues/6481 (scheduled for %12.1), and doesn't work in non-NFS deployments anyway.
Elasticsearch wiki indexing is currently broken for non-NFS deployments. If we've advertised that scenario as working, then we need to upgrade this to an ~S2 ~bug, I think.
Intended users
Instance administrators
Proposal
Enhance gitlab-elasticsearch-indexer so that it can handle the wiki case. In practice, this means re-using the existing repository parsing code, and writing the content to slightly different field names.
We'll need to modify gitlab-ee so that - instead of directly calling project.wiki.index_blobs
: https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/app/models/concerns/elastic/wiki_repositories_search.rb#L26 - it enqueues an ElasticCommitIndexerWorker
for the wiki project, instead.
Permissions and Security
No changes required
Documentation
No changes required
Testing
We have existing tests for searching a wiki repository. We could prevent regressions by making them use gitlab-elasticsearch-indexer
- it's work that's implicitly scheduled for %12.1 anyway, so a headstart would be nice.
What does success look like, and how can we measure that?
In a gitlab-ee HA deployment that doesn't have NFS / shared filesystems enabled, wikis can be successfully indexed and searched
What is the type of buyer?
GitLab Starter GitLab.com Priority
Links / references
cc @vsizov @mdelaossa @smcgivern @DouweM @bcupini