Unclear how to proceed when elasticsearch indexing fails to reach 100%
Background
https://gitlab.zendesk.com/agent/tickets/96500
https://gitlab.zendesk.com/agent/tickets/103487
Description
For a variety of reasons, we see some enterprise customers setting up Advanced Global Search with elasticsearch, both on-premise and in AWS where indexing will reach 99.8% complete and fail, leaving the administrators unsure what to do.
We know on AWS elasticsearch the http.max_content_length is hard capped to a lower size than the biggest payloads in some enterprise instances of GitLab. In other cases there may be corrupt databases causing indexing to throw errors. In some cases re-running the rake commands will achieve a greater % of records indexed, yet still never reach 100%.
Admins would like more clear direction around what they should do in these scenarios. Should they troubleshoot indexing? Should they ignore 1% of their code not being searchable? How should they set expectations with users that some data may be missing? Is there anything we can do to make indexing more resilient.
In many cases it appears SOME of the content will be searchable in these scenarios, however there is conflicting opinions about whether this will work if the initial indexing process never completes. Some customers have reported that some search works, but it is unclear the degree to which this feature is usable in a less-than 100% indexed state.
Proposal
Invest some time researching the nature of the indexing problems to see if we can fix, or if we should focus on the admin and user experience when there are failures so that this doesn't feel like a degraded state.
Links / references
Customers
https://gitlab.my.salesforce.com/00161000004zrF8
https://gitlab.my.salesforce.com/00161000004bZxf