Skip to content

Projects are not being fully indexed

Summary

It appears that projects are not being fully indexed into Elasticsearch on GitLab.com. There have been a few instances where a customer reports not seeing files in search, yet the project index status reports that the indexing has occurred. A manual reindex of the project has been shown to fix the issue.

Previously reported under: #250856 (closed)

Steps to reproduce

We have not been able to reproduce this locally. Though a few of the projects were imported, it's not definitely only happening to imports.

Example Project

What is the current bug behavior?

Project data is partially indexed.

What is the expected correct behavior?

All files should be indexed correctly into Elasticsearch AND the index status should not be updated unless all files were successfully indexed.

Possible fixes

From #259721 (comment 523934361) we should ensure we "delete if exists" the IndexStatus whenever we run ElasticDeleteProjectWorker. Mostly this runs when a project is deleted so there won't be any project.index_status. But we can do something like IndexStatus.where(project_id: project_id).delete_all in this code to avoid this problem.

Workaround in the meantime

Any project code can be completely reindexed by running the following (note that this can be slow for large projects as it reindexes every commit and file again):

project_id = # replace me with project ID

project = Project.find(project_id)
index_status = project.index_status
index_status.destroy
ElasticCommitIndexerWorker.perform_async(project.id)

Workaround for a whole group

Since #259721 (comment 523934361) implies that the problem is likely to affect an entire group we may want to index all the repositories in the group again. We should first check that the single project is fixed by the above before we do this since it is expensive for large groups:

group = Group.find(<ID>)

project_ids = group.all_projects.pluck(:id)

project_ids.each_slice(50) do |ids|
  p ids # In case we fail half way through we have a trail of where we got up to

  ids.each do |project_id|
    project = Project.find(project_id)
    index_status = project.index_status
    if index_status
      index_status.destroy
    end
      ElasticCommitIndexerWorker.perform_async(project.id)
  end

  sleep(1)
end

ZD Ticket (internal):

Edited by Cynthia "Arty" Ng