Incomplete Elasticsearch Indexing on GitLab-Licensed Servers

Title: Incomplete Elasticsearch Indexing on GitLab-Licensed Instances

Description:

We are experiencing an issue where not all GitLab projects are being indexed for search on our GitLab-Licensed instances. Some projects are intermittently missing from the Elasticsearch index, impacting search functionality. Running the following rake task resolves the issue temporarily by manually re-indexing all projects:

sudo gitlab-rake gitlab:elastic:index_projects

Prior to re-indexing:

gitlab-rake gitlab:elastic:index_projects_status
Indexing was 98.69% complete

After manually rerunning the indexing task, all projects were successfully indexed.

Research Summary:

I investigated related GitLab issues and identified several likely root causes:

  • #214601 introduces a search_index_integrity feature intended to detect and correct incomplete indexes. However, it remains behind a feature flag and is not currently recommended for production.
  • #360579 and #392981 point to interrupted or silently failing Sidekiq jobs as a potential cause for incomplete indexing.
  • GitLab’s ongoing work in #458804 outlines a potential redesign aimed at making indexing more robust and self-healing.

Environment Details:

  • GET version: 3.7
  • Cloud Provider: AWS
  • GitLab Edition: Licensed (Premium)
  • Elasticsearch: Enabled on licensed instances only (dev and prod)

Request:

  1. What is the recommended solution or fix for ensuring complete and consistent indexing?
  2. Is there a known workaround, or is enabling search_index_integrity safe for production in our case?
  3. Are there any workarounds to avoid these silent failures (e.g. retry strategies, alerting)?
  4. Are there updates or ETAs regarding the improved indexing mechanism discussed in #458804?


Edited by Abdullah Amer