Backfill traversal_ids in notes index
What does this MR do and why?
This change adds a migration to backfill traversal IDs in the notes index for Elasticsearch. The migration is scheduled for version 18.1 and will process notes in batches of 10,000 with a 1-minute delay between batches to reduce system load. The code includes a test file that verifies the migration works correctly with different types of notes (issue notes, project snippet notes, commit notes, and merge request notes). There's also a small fix to the shared test examples to correctly handle test cases with more than 4 objects. This migration is part of the Global Search group's efforts to improve search functionality.
This MR is step 2 in improving notes query performance. Notes uses the legacy authorization in Elasticsearch queries which can send 1000s of project_id to Elasticsearch. This will improve performance for global and group searches (the same was previously done for code, merge requests, issues, etc). The query needs to be improved because it is now used during issue search. The plan for improving the query is:
- add
traversal_idsto notes index !193056 (merged) - backfill
traversal_idsin notes index (this MR) - switch notes index to use new authorization in queries (behind a FF)
- remove FF
References
- Improve notes query performance (#549170 - closed)
- [FF] `search_work_item_queries_notes` -- advanc... (#536912 - closed)
- MR1: Add traversal_ids to notes index (!193056 - merged)
- [follow up for MR1] Use namespace to generate traversal_id in note ... (!193344 - merged)
Screenshots or screen recordings
| Before | After |
|---|---|
How to set up and validate locally
after checking out the branch, you will probably need to restart rails-background-jobs to make sure the workers pick up the sidekiq jobs
- enable advanced search
- reset schema_version in indexed data
curl --request POST \ --url 'http://localhost:9200/gitlab-development-notes/_update_by_query?wait_for_completion=true&refresh=true' \ --header 'Content-Type: application/json' \ --data '{ "script": { "source": "ctx._source.schema_version=2222" }, "query": { "match_all": {} } }' - remove migration from migrations index (if it exists)
curl --request DELETE --url http://localhost:9200/gitlab-development-migrations/_doc/20250530160142 - open rails console, run the migration worker:
Elastic::MigrationWorker.new.perform - (optional) in rails console, run the indexing process manually:
Elastic::ProcessInitialBookkeepingService.new.execute - run the migration worker:
Elastic::MigrationWorker.new.performuntil it's completed - watch the logs in
log/elasticsearch.logto make sure the migration runs and passes
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.