Incomplete Elasticsearch Indexing on GitLab-Licensed Servers
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Title: Incomplete Elasticsearch Indexing on GitLab-Licensed Instances
Description:
We are experiencing an issue where not all GitLab projects are being indexed for search on our GitLab-Licensed instances. Some projects are intermittently missing from the Elasticsearch index, impacting search functionality. Running the following rake task resolves the issue temporarily by manually re-indexing all projects:
sudo gitlab-rake gitlab:elastic:index_projects
Prior to re-indexing:
gitlab-rake gitlab:elastic:index_projects_status
Indexing was 98.69% complete
After rerunning the indexing task, all projects were successfully indexed.
Research Summary:
I investigated related GitLab issues and identified several likely root causes:
-
#214601 (closed) introduces a
search_index_integrityfeature intended to detect and correct incomplete indexes. However, it remains behind a feature flag and is not currently recommended for production. - #360579 (closed) and #392981 (closed) point to interrupted or silently failing Sidekiq jobs as a potential cause for incomplete indexing.
- GitLab’s ongoing work in #458804 (closed) outlines a potential redesign aimed at making indexing more robust and self-healing.
Environment Details:
- GET version: 3.7
- Cloud Provider: AWS
- GitLab Edition: Licensed (Premium)
- Elasticsearch: Enabled on licensed instances only (dev and prod)
Request:
- What is the recommended solution or fix for ensuring complete and consistent indexing?
- Is there a known workaround, or is enabling
search_index_integritysafe for production in our case? - Are there any interim mitigations to avoid these silent failures (e.g. retry strategies, alerting)?
- Are there updates or ETAs regarding the improved indexing mechanism discussed in #458804 (closed)?
Edited by 🤖 GitLab Bot 🤖