Incomplete Elasticsearch Indexing on GitLab-Licensed Servers

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Title: Incomplete Elasticsearch Indexing on GitLab-Licensed Instances

Description:

We are experiencing an issue where not all GitLab projects are being indexed for search on our GitLab-Licensed instances. Some projects are intermittently missing from the Elasticsearch index, impacting search functionality. Running the following rake task resolves the issue temporarily by manually re-indexing all projects:

sudo gitlab-rake gitlab:elastic:index_projects

Prior to re-indexing:

gitlab-rake gitlab:elastic:index_projects_status
Indexing was 98.69% complete

After rerunning the indexing task, all projects were successfully indexed.

Research Summary:

I investigated related GitLab issues and identified several likely root causes:

  • #214601 (closed) introduces a search_index_integrity feature intended to detect and correct incomplete indexes. However, it remains behind a feature flag and is not currently recommended for production.
  • #360579 (closed) and #392981 (closed) point to interrupted or silently failing Sidekiq jobs as a potential cause for incomplete indexing.
  • GitLab’s ongoing work in #458804 (closed) outlines a potential redesign aimed at making indexing more robust and self-healing.

Environment Details:

  • GET version: 3.7
  • Cloud Provider: AWS
  • GitLab Edition: Licensed (Premium)
  • Elasticsearch: Enabled on licensed instances only (dev and prod)

Request:

  1. What is the recommended solution or fix for ensuring complete and consistent indexing?
  2. Is there a known workaround, or is enabling search_index_integrity safe for production in our case?
  3. Are there any interim mitigations to avoid these silent failures (e.g. retry strategies, alerting)?
  4. Are there updates or ETAs regarding the improved indexing mechanism discussed in #458804 (closed)?


Edited by 🤖 GitLab Bot 🤖