Self-serve elasticsearch for the gitlab-org group on staging.gitlab.com and gitlab.com
@mwasilewski-gitlab and the production team have been doing some great work getting elasticsearch enabled and working on staging.gitlab.com so far - their infrastructure issues: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/6587#note_165268577
In %12.0, we (the ~Create) team are hopeful that we can begin to self-serve administration of this feature. Production would be on top of the elasticsearch cluster itself, and have clear runbooks about how to turn it off and prevent it from monopolizing sidekiq, but we would be in charge of actually getting the content into the cluster, testing the feature, and eventually enabling and maintaining it, in a manner analogous to how the Gitaly rollout was performed, or the GCP migration.
In service of that, I'll be investigating the two outstanding things @mwasilewski-gitlab identifies in his issue:
-
Elasticsearch indexing and Elasticsearch search settings not being completely respected - This is fixed in 12.0, I validated search with indexing on, but ES off, at instance, group, and project level
-
Create an issue for "Project dropdown at global scope only shows personal projects" https://gitlab.com/gitlab-org/gitlab-ce/issues/50981
-
Very long-lived ElasticIndexerWorker
jobs- Fixed in 12.0, indexing gitlab-ce takes just 30 minutes. Should be sufficient.
-
Broken group-level search for projects needs fixing: https://gitlab.com/gitlab-org/gitlab-ee/issues/12091
I'll also be investigating the index more thoroughly, performing a thorough audit to make sure that:
-
All documents that should be indexed, are - Some large snippets don't import: https://gitlab.com/gitlab-org/gitlab-ee/issues/12111
-
All documents that should not be indexed, are not - The only documents outside the gitlab-ce project that were indexed, were personal snippets. This is expected.
-
Total document numbers are within expectations - Around 743K documents for the gitlab-org group, dominated by gitlab-ce and gitlab-ee.
- Notes and commits dominate.
-
Total index size is within expectations - The gitlab-org group takes ~3.2GiB of index size
- The gitlab-ce project by itself takes ~450MiB
-
Indexing wikis works -
Create an issue to improve rake task performance, or simply integrate into IndexRecordService
-
-
Search performance is improved at both group and project level -
Advanced search syntax works as expected -
Code and commit search works at group level -
Searches at global level still include items from the gitlab-org
group -
Removing data from the index works -
Create an issue for making ElasticIndexedProject
work correctly when search is disabled https://gitlab.com/gitlab-org/gitlab-ee/issues/12113 -
Create an issue to resolve the on-delete conflicts problem https://gitlab.com/gitlab-org/gitlab-ee/issues/12114 -
Personal snippets are not removed, but are currently unmanaged
-
-
All search tabs work at group and project scope -
Page 2 of searches work https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/14258 -
... other things to check ...
A lot of this will be free exploration - just making sure I look at every aspect of operation of the feature, and searching for things that seem out of place or unusual.