Skip to content

Add project check in index repair service

Terri Chu requested to merge 214601-add-project-in-index-check into master

What does this MR do and why?

Related to #214601 (closed)

This iteration is adding an additional check for whether the project document exists in the index.

I decided to add this check after debugging an issue with customer code search coming up empty for a specific project. The code/blob documents were in the index but the project document was missing. When the project is missing from the index, it will cause the code search query to return no results since there is a parent_join included for project searches in Elasticsearch.

The MR also includes some refactoring in the search index repair service:

  • split checks into methods to make increase readability and allow adding/removing checks quickly
  • add routing to all of the Elasticsearch queries to speed them up

Screenshots or screen recordings

N/A

How to set up and validate locally

  1. setup gdk for elasticsearch
  2. index your gdk data: bundle exec rake gitlab:elastic:index
  3. enable Advanced Search in admin ui: http://gdk.test:3000/admin/application_settings/advanced_search
  4. enable the FF search_index_integrity: Feature.enable(:search_index_integrity)
  5. find a project with repository data and mark down the project id
  6. delete a project document from the index (replace the id 1 with the project id from step above)
curl --request POST \
  --url http://localhost:9200/gitlab-development/_delete_by_query \
  --header 'Content-Type: application/json' \
  --data '{
	"query": {
		"bool": {
			"must": [
				{
					"term": {
						"type": {
							"value": "project"
						}
					}
				},
				{
					"term": {
						"id": {
							"value": 1
						}
					}
				}
			]
		}
	}
}'
  1. run the repair service from the rails console:
[1] pry(main)> project = Project.find(1)
  Project Load (2.3ms)  SELECT "projects".* FROM "projects" WHERE "projects"."id" = 1 LIMIT 1 /*application:console,db_config_name:main,console_hostname:terrichus-MBP.localdomain,console_username:terrichu,line:(pry):1:in `__pry__'*/
  Route Load (0.6ms)  SELECT "routes".* FROM "routes" WHERE "routes"."source_id" = 1 AND "routes"."source_type" = 'Project' LIMIT 1 /*application:console,db_config_name:main,console_hostname:terrichus-MBP.localdomain,console_username:terrichu,line:/app/models/concerns/routable.rb:141:in `block in full_attribute'*/
=> #<Project id:1 toolbox/gitlab-smoke-tests>>
[2] pry(main)> ::Search::IndexRepairService.execute(project)
  Namespace Load (1.3ms)  SELECT "namespaces".* FROM "namespaces" WHERE "namespaces"."id" = 22 LIMIT 1 /*application:console,db_config_name:main,console_hostname:terrichus-MBP.localdomain,console_username:terrichu,line:/app/models/project.rb:2911:in `root_namespace'*/
=> true
  1. verify in log/elasticsearch.log that the service logs that the project document is missing
{"severity":"WARN","time":"2023-03-30T18:42:52.897Z","correlation_id":null,"class":"Search::IndexRepairService","message":"project document missing from index","namespace_id":22,"root_namespace_id":22,"project_id":1}

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Terri Chu

Merge request reports