Skip to content

Use scroll API in schema version search migration helper

What does this MR do and why?

This code improves how a search system migrates data by adding a new "scroll API" method alongside the existing search method.

The scroll API is better for processing large amounts of data because it maintains a cursor position, allowing the system to efficiently page through results without missing or duplicating records. The system automatically chooses which method to use - it uses scroll API when there are many documents to process or when continuing an existing scroll session, otherwise it uses the regular search method.

The changes also improve error handling by properly cleaning up scroll sessions when they're no longer needed, and add better logging to track migration progress. The code was refactored to make it more modular - separating the logic for handling different types of searches into distinct methods.

Additionally, some technical debt was cleaned up by standardizing how the system identifies the primary key field for different document types, making the code more maintainable and consistent across different data models.

References

Screenshots or screen recordings

Before After

How to set up and validate locally

  1. setup elasticsearch in gdk
  2. i created a fake migration to test this and also have about 16,000 projects in my index
  3. i changed the queue threshold and project index SCHEMA_VERSION to force it to use the scroll api
  4. you can verify that the scroll api is being used by running the migration worker and checking the migration state in the index
  5. in rails console, kick off the migration worker
     Elastic::MigrationWorker.new.perform
  6. verify in migration index
     curl --request GET --url http://localhost:9200/gitlab-development-migrations/_doc/20250724114546
migration to test it out

add as ee/elastic/migrate/20250724114546_backfill_projects_index_test.rb

# frozen_string_literal: true

class BackfillProjectsIndexTest < Elastic::Migration
  include ::Search::Elastic::MigrationReindexBasedOnSchemaVersion

  batched!
  batch_size 5000
  throttle_delay 1.minute

  DOCUMENT_TYPE = Project
  NEW_SCHEMA_VERSION = 25_30
end
diff
diff --git a/ee/app/workers/concerns/search/elastic/migration_reindex_based_on_schema_version.rb b/ee/app/workers/concerns/search/elastic/migration_reindex_based_on_schema_version.rb
index 1aae0402e768..565b52331d98 100644
--- a/ee/app/workers/concerns/search/elastic/migration_reindex_based_on_schema_version.rb
+++ b/ee/app/workers/concerns/search/elastic/migration_reindex_based_on_schema_version.rb
@@ -9,8 +9,8 @@ module MigrationReindexBasedOnSchemaVersion
       include Search::Elastic::IndexName
 
       UPDATE_BATCH_SIZE = 100
-      QUEUE_THRESHOLD = 50_000
-      SCROLL_TIMEOUT = '5m'
+      QUEUE_THRESHOLD = 10000
+      SCROLL_TIMEOUT = '1m'
 
       def migrate
         if completed?
diff --git a/ee/lib/elastic/latest/project_instance_proxy.rb b/ee/lib/elastic/latest/project_instance_proxy.rb
index 1bc3f770b693..c1498bba1164 100644
--- a/ee/lib/elastic/latest/project_instance_proxy.rb
+++ b/ee/lib/elastic/latest/project_instance_proxy.rb
@@ -5,7 +5,7 @@ module Latest
     class ProjectInstanceProxy < ApplicationInstanceProxy
       extend ::Gitlab::Utils::Override
 
-      SCHEMA_VERSION = 25_06
+      SCHEMA_VERSION = 25_30
 
       TRACKED_FEATURE_SETTINGS = %w[
         issues_access_level

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Terri Chu

Merge request reports

Loading