Skip to content

Add index integrity worker

What does this MR do and why?

Related to #214601 (closed)

Initial work to create an index integrity worker. This MR introduces:

  • a new index integrity worker
  • a new index repair service
  • a new after_action for search controller (in EE context only)
  • specs for everyone!

Screenshots or screen recordings

N/A- all backend work

How to set up and validate locally

get all blobs for a project note: Replace project_id with the project you are working with
curl --request POST \
  --url http://localhost:9200/gitlab-development/_search \
  --header 'Content-Type: application/json' \
  --cookie 'perf_bar_enabled=true; experimentation_subject_id=IjQwMjUxOWZlLWIwYWItNDZlNi1hY2VkLTRjMWE0NzZkMjAyNCI%253D--dc985bd87edc1f47a1018fbc26fdc35dbeab34ba; BetterErrors-2.9.1-CSRF-Token=67dca20f-92f6-4685-8085-56fa84085f14' \
  --data '{
	"query": {
		"bool": {
			"must": [
				{
					"term": {
						"type": {
							"value": "blob"
						}
					}
				},
				{
					"term": {
						"project_id": {
							"value": 7
						}
					}
				}
			]
		}
	}
}'
remove blob data for a project note: Replace project_id with the project you are working with
curl --request POST \
  --url http://localhost:9200/gitlab-development/_delete_by_query \
  --header 'Content-Type: application/json' \
  --cookie 'perf_bar_enabled=true; experimentation_subject_id=IjQwMjUxOWZlLWIwYWItNDZlNi1hY2VkLTRjMWE0NzZkMjAyNCI%253D--dc985bd87edc1f47a1018fbc26fdc35dbeab34ba; BetterErrors-2.9.1-CSRF-Token=67dca20f-92f6-4685-8085-56fa84085f14' \
  --data '{
	"query": {
		"bool": {
			"must": [
				{
					"term": {
						"type": {
							"value": "blob"
						}
					}
				},
				{
					"term": {
						"project_id": {
							"value": 7
						}
					}
				}
			]
		}
	}
}'
  1. make sure that gdk is setup for elasticsearch, the indexes are created/setup, and advanced search is enabled
  2. enable the feature flag: Feature.enable(:search_index_integrity)
  3. perform a code search for one of the projects (I chose flightjs/flight)
  4. verify results come back
  5. verify how many blobs exist (use get all blobs for a project query above) against Elasticsearch instance (run on localhost:9200 in gdk)
  6. delete all of those blobs from the index (use remove blob data for a project query above) and verify they are gone against Elasticsearch instance (run on localhost:9200 in gdk)
  7. run a project search in gdk, verify no results
  8. verify the index integrity worker runs for the project: gdk tail rails-background-jobs
2023-02-24_16:46:44.17683 rails-background-jobs : {"severity":"INFO","time":"2023-02-24T16:46:44.176Z","retry":25,"queue":"default","backtrace":true,"version":0,"args":["[FILTERED]","7"],"class":"Search::IndexIntegrityWorker","jid":"a4ec773b907c8dc992937286","created_at":"2023-02-24T16:46:44.169Z","correlation_id":"01GT253QZTNSJYMDJ5YWXQH059","meta.caller_id":"SearchController#show","meta.remote_ip":"127.0.0.1","meta.feature_category":"global_search","meta.user":"root","meta.user_id":1,"meta.project":"flightjs/Flight","meta.root_namespace":"flightjs","meta.client_id":"user/1","meta.root_caller_id":"SearchController#show","worker_data_consistency":"delayed","wal_locations":{},"wal_location_source":"primary","idempotency_key":"resque:gitlab:duplicate:default:9ee35784997d131608e87df5fe6834da84cd1db76ad5694e968d4bba514b5386","size_limiter":"validated","enqueued_at":"2023-02-24T16:46:44.175Z","job_size_bytes":8,"pid":22084,"message":"Search::IndexIntegrityWorker JID-a4ec773b907c8dc992937286: start","job_status":"start","scheduling_latency_s":0.000779}
  1. verify the index repair services adds a log entry in the elasticsearch.log file
{"severity":"WARN","time":"2023-02-24T16:46:49.796Z","correlation_id":"01GT253XHS8ENKNJ51FQ7FNFHN","class":"Search::IndexRepairService","message":"blob documents missing from index for project","project_id":7,"project_commit":{"id":"f15b32277d2c55c6c595845a87109b09c913c556","message":"v1.5.2\n","parent_ids":["8749d49930866a4871fa086adbd7d2057fcc3ebb"],"authored_date":"2017-06-19T14:39:45.000-07:00","author_name":"Andrew Lunny","author_email":"alunny@twitter.com","committed_date":"2017-06-19T14:39:53.000-07:00","committer_name":"Andrew Lunny","committer_email":"alunny@twitter.com","trailers":{}},"project_last_repository_updated_at":"2023-02-17T20:09:02.537Z","index_status_last_commit":"f15b32277d2c55c6c595845a87109b09c913c556","index_status_indexed_at":"2023-02-22T19:10:45.757Z","repository_size":765460}
  1. run a group search, verify no results
  2. verify the index integrity worker runs for the namespace and queues up a new worker for the project
2023-02-24_16:50:11.83565 rails-background-jobs : {"severity":"INFO","time":"2023-02-24T16:50:11.835Z","retry":25,"queue":"default","backtrace":true,"version":0,"args":["33","[FILTERED]"],"class":"Search::IndexIntegrityWorker","jid":"a61110823e8d8580a50ac776","created_at":"2023-02-24T16:50:11.817Z","correlation_id":"01GT25A2ZFSNBBD0ZQPCP1X7BJ","meta.caller_id":"SearchController#show","meta.remote_ip":"127.0.0.1","meta.feature_category":"global_search","meta.user":"root","meta.user_id":1,"meta.root_namespace":"flightjs","meta.client_id":"user/1","meta.root_caller_id":"SearchController#show","worker_data_consistency":"delayed","wal_locations":{},"wal_location_source":"primary","idempotency_key":"resque:gitlab:duplicate:default:09ee9c6ffd8ed564b18d6501bbb59805758a83e5f8fae3df2f57d9a483697b0f","size_limiter":"validated","enqueued_at":"2023-02-24T16:50:11.818Z","job_size_bytes":9,"pid":22084,"message":"Search::IndexIntegrityWorker JID-a61110823e8d8580a50ac776: start","job_status":"start","scheduling_latency_s":0.014567}
2023-02-24_16:50:11.90737 rails-background-jobs : {"severity":"INFO","time":"2023-02-24T16:50:11.906Z","retry":25,"queue":"default","backtrace":true,"version":0,"args":["33","7"],"class":"Search::IndexIntegrityWorker","jid":"1a0d4184e9609e27926d9bc6","created_at":"2023-02-24T16:50:11.863Z","correlation_id":"01GT25A2ZFSNBBD0ZQPCP1X7BJ","meta.caller_id":"Search::IndexIntegrityWorker","meta.remote_ip":"127.0.0.1","meta.feature_category":"global_search","meta.user":"root","meta.user_id":1,"meta.root_namespace":"flightjs","meta.client_id":"user/1","meta.root_caller_id":"SearchController#show","worker_data_consistency":"delayed","wal_locations":{},"wal_location_source":"primary","idempotency_key":"resque:gitlab:duplicate:default:decdc9edc9ae65ad6e33020e998c181f40b6759e672bcf2a0b857f1a822a5707","size_limiter":"validated","enqueued_at":"2023-02-24T16:50:11.864Z","job_size_bytes":6,"pid":22084,"message":"Search::IndexIntegrityWorker JID-1a0d4184e9609e27926d9bc6: done: 0.041452 sec","job_status":"done","scheduling_latency_s":0.000683,"gitaly_calls":1,"gitaly_duration_s":0.012291,"redis_calls":4,"redis_duration_s":0.000812,"redis_read_bytes":215,"redis_write_bytes":281,"redis_queues_calls":2,"redis_queues_duration_s":0.000289,"redis_queues_read_bytes":2,"redis_queues_write_bytes":186,"redis_repository_cache_calls":2,"redis_repository_cache_duration_s":0.000523,"redis_repository_cache_read_bytes":213,"redis_repository_cache_write_bytes":95,"elasticsearch_calls":1,"elasticsearch_duration_s":0.006586,"elasticsearch_timed_out_count":0,"db_count":4,"db_write_count":0,"db_cached_count":0,"db_replica_count":0,"db_primary_count":4,"db_main_count":4,"db_ci_count":0,"db_main_replica_count":0,"db_ci_replica_count":0,"db_replica_cached_count":0,"db_primary_cached_count":0,"db_main_cached_count":0,"db_ci_cached_count":0,"db_main_replica_cached_count":0,"db_ci_replica_cached_count":0,"db_replica_wal_count":0,"db_primary_wal_count":0,"db_main_wal_count":0,"db_ci_wal_count":0,"db_main_replica_wal_count":0,"db_ci_replica_wal_count":0,"db_replica_wal_cached_count":0,"db_primary_wal_cached_count":0,"db_main_wal_cached_count":0,"db_ci_wal_cached_count":0,"db_main_replica_wal_cached_count":0,"db_ci_replica_wal_cached_count":0,"db_replica_duration_s":0.0,"db_primary_duration_s":0.007,"db_main_duration_s":0.007,"db_ci_duration_s":0.0,"db_main_replica_duration_s":0.0,"db_ci_replica_duration_s":0.0,"cpu_s":0.016403,"worker_id":"sidekiq_0","rate_limiting_gates":[],"duration_s":0.041452,"completed_at":"2023-02-24T16:50:11.906Z","load_balancing_strategy":"primary_no_wal","db_duration_s":0.00312}
  1. verify the index integrity worker runs for the namespace: gdk tail rails-background-jobs
{"severity":"INFO","time":"2023-02-24T16:50:11.845Z","correlation_id":"01GT25A2ZFSNBBD0ZQPCP1X7BJ","class":"Search::IndexIntegrityWorker","message":"enqueueing all projects for namespace","namespace_id":33}
  1. verify the index repair services adds a log entry in the <gdk_dir>/gitlab/log/elasticsearch.log file
{"severity":"WARN","time":"2023-02-24T16:50:11.905Z","correlation_id":"01GT25A2ZFSNBBD0ZQPCP1X7BJ","class":"Search::IndexRepairService","message":"blob documents missing from index for project","project_id":7,"project_commit":{"id":"f15b32277d2c55c6c595845a87109b09c913c556","message":"v1.5.2\n","parent_ids":["8749d49930866a4871fa086adbd7d2057fcc3ebb"],"authored_date":"2017-06-19T14:39:45.000-07:00","author_name":"Andrew Lunny","author_email":"alunny@twitter.com","committed_date":"2017-06-19T14:39:53.000-07:00","committer_name":"Andrew Lunny","committer_email":"alunny@twitter.com","trailers":{}},"project_last_repository_updated_at":"2023-02-17T20:09:02.537Z","index_status_last_commit":"f15b32277d2c55c6c595845a87109b09c913c556","index_status_indexed_at":"2023-02-22T19:10:45.757Z","repository_size":765460}

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Terri Chu

Merge request reports