Use scroll API in schema version search migration helper
What does this MR do and why?
This code improves how a search system migrates data by adding a new "scroll API" method alongside the existing search method.
The scroll API is better for processing large amounts of data because it maintains a cursor position, allowing the system to efficiently page through results without missing or duplicating records. The system automatically chooses which method to use - it uses scroll API when there are many documents to process or when continuing an existing scroll session, otherwise it uses the regular search method.
The changes also improve error handling by properly cleaning up scroll sessions when they're no longer needed, and add better logging to track migration progress. The code was refactored to make it more modular - separating the logic for handling different types of searches into distinct methods.
Additionally, some technical debt was cleaned up by standardizing how the system identifies the primary key field for different document types, making the code more maintainable and consistent across different data models.
References
Screenshots or screen recordings
| Before | After |
|---|---|
How to set up and validate locally
- setup elasticsearch in gdk
- i created a fake migration to test this and also have about 16,000 projects in my index
- i changed the queue threshold and project index
SCHEMA_VERSIONto force it to use the scroll api - you can verify that the scroll api is being used by running the migration worker and checking the migration state in the index
- in rails console, kick off the migration worker
Elastic::MigrationWorker.new.perform - verify in migration index
curl --request GET --url http://localhost:9200/gitlab-development-migrations/_doc/20250724114546
migration to test it out
add as ee/elastic/migrate/20250724114546_backfill_projects_index_test.rb
# frozen_string_literal: true
class BackfillProjectsIndexTest < Elastic::Migration
include ::Search::Elastic::MigrationReindexBasedOnSchemaVersion
batched!
batch_size 5000
throttle_delay 1.minute
DOCUMENT_TYPE = Project
NEW_SCHEMA_VERSION = 25_30
end
diff
diff --git a/ee/app/workers/concerns/search/elastic/migration_reindex_based_on_schema_version.rb b/ee/app/workers/concerns/search/elastic/migration_reindex_based_on_schema_version.rb
index 1aae0402e768..565b52331d98 100644
--- a/ee/app/workers/concerns/search/elastic/migration_reindex_based_on_schema_version.rb
+++ b/ee/app/workers/concerns/search/elastic/migration_reindex_based_on_schema_version.rb
@@ -9,8 +9,8 @@ module MigrationReindexBasedOnSchemaVersion
include Search::Elastic::IndexName
UPDATE_BATCH_SIZE = 100
- QUEUE_THRESHOLD = 50_000
- SCROLL_TIMEOUT = '5m'
+ QUEUE_THRESHOLD = 10000
+ SCROLL_TIMEOUT = '1m'
def migrate
if completed?
diff --git a/ee/lib/elastic/latest/project_instance_proxy.rb b/ee/lib/elastic/latest/project_instance_proxy.rb
index 1bc3f770b693..c1498bba1164 100644
--- a/ee/lib/elastic/latest/project_instance_proxy.rb
+++ b/ee/lib/elastic/latest/project_instance_proxy.rb
@@ -5,7 +5,7 @@ module Latest
class ProjectInstanceProxy < ApplicationInstanceProxy
extend ::Gitlab::Utils::Override
- SCHEMA_VERSION = 25_06
+ SCHEMA_VERSION = 25_30
TRACKED_FEATURE_SETTINGS = %w[
issues_access_level
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.