Incomplete Elasticsearch Indexing on GitLab-Licensed Servers
Title: Incomplete Elasticsearch Indexing on GitLab-Licensed Instances
Description:
We are experiencing an issue where not all GitLab projects are being indexed for search on our GitLab-Licensed instances. Some projects are intermittently missing from the Elasticsearch index, impacting search functionality. Running the following rake task resolves the issue temporarily by manually re-indexing all projects:
sudo gitlab-rake gitlab:elastic:index_projects
Prior to re-indexing:
gitlab-rake gitlab:elastic:index_projects_status
Indexing was 98.69% complete
After manually rerunning the indexing task, all projects were successfully indexed.
Research Summary:
I investigated related GitLab issues and identified several likely root causes:
-
#214601 introduces a
search_index_integrityfeature intended to detect and correct incomplete indexes. However, it remains behind a feature flag and is not currently recommended for production. - #360579 and #392981 point to interrupted or silently failing Sidekiq jobs as a potential cause for incomplete indexing.
- GitLab’s ongoing work in #458804 outlines a potential redesign aimed at making indexing more robust and self-healing.
Environment Details:
- GET version: 3.7
- Cloud Provider: AWS
- GitLab Edition: Licensed (Premium)
- Elasticsearch: Enabled on licensed instances only (dev and prod)
Request:
- What is the recommended solution or fix for ensuring complete and consistent indexing?
- Is there a known workaround, or is enabling
search_index_integritysafe for production in our case? - Are there any workarounds to avoid these silent failures (e.g. retry strategies, alerting)?
- Are there updates or ETAs regarding the improved indexing mechanism discussed in #458804?
Edited by Abdullah Amer