Skip to content

Advanced Search reindexing add support for retries

What does this MR do?

Related to #294210 (closed)

The Elasticsearch integration allows for Zero Downtime Reindexing. Currently, reindexing is broken into subtasks for each index (currently there is the production, issues, and notes indexes). If any subtask fails, the entire reindexing process is marked as a failure.

This MR introduces automated retry capability for the subtasks. The Elasticsearch Reindex API is used and each subtask is broken up into manual slices with the max slices being set to the index shard count * 2. The slices are kicked off in batches of 60 and every run will start new slices to continue running 60 at a time. Each slice is given a Task ID which is used to look up the status in the Elasticsearch Tasks API. The task information is used to determine if the tasks are complete, had errors, and whether all documents were processed. Each retry is tracked and a hardcoded retry limit is used to make sure reindexing tasks do not retry forever.

Notes:

This MR has REINDEX_MAX_TOTAL_SLICES_RUNNING and REINDEX_SLICE_MULTIPLIER stored as constants. I plan to create another MR which moves them into the database and exposes a UI element in the Admin area so the numbers can be adjusted for each reindex process.

This MR keeps the existing code as LegacyReindexingService to support any in progress reindexing during an upgrade. It will be removed in a follow up issue #329566 (closed)

Database Specific Info

up

$ bundle exec rails db:migrate:up VERSION=20210422195929                                                                             == 20210422195929 CreateElasticReindexingSlices: migrating ====================
-- table_exists?(:elastic_reindexing_slices)
   -> 0.0006s
-- create_table(:elastic_reindexing_slices, {})
-- quote_column_name(:elastic_task)
   -> 0.0000s
   -> 0.0188s
-- quote_table_name("check_ca30e1396e")
   -> 0.0000s
-- quote_table_name(:elastic_reindexing_slices)
   -> 0.0000s
-- execute("ALTER TABLE \"elastic_reindexing_slices\"\nADD CONSTRAINT \"check_ca30e1396e\" CHECK (char_length(\"elastic_task\") <= 255)\n")
   -> 0.0009s
== 20210422195929 CreateElasticReindexingSlices: migrated (0.1363s) ===========

$ bundle exec rails db:migrate:up VERSION=20210421190157                                                                             
== 20210421190157 RemoveElasticTaskNullConstraintFromElasticReindexingSubtasks: migrating
-- execute("ALTER TABLE elastic_reindexing_subtasks\nDROP CONSTRAINT IF EXISTS check_aaf4e1bc37\n")
   -> 0.0009s
-- change_column_null(:elastic_reindexing_subtasks, :elastic_task, true)
   -> 0.0011s
== 20210421190157 RemoveElasticTaskNullConstraintFromElasticReindexingSubtasks: migrated (0.0087s)

down

$ bundle exec rails db:migrate:down VERSION=20210422195929                                                                           == 20210422195929 CreateElasticReindexingSlices: reverting ====================
-- drop_table(:elastic_reindexing_slices)
   -> 0.0194s
== 20210422195929 CreateElasticReindexingSlices: reverted (0.0195s) ===========

$ bundle exec rails db:migrate:down VERSION=20210421190157                                                                                == 20210421190157 RemoveElasticTaskNullConstraintFromElasticReindexingSubtasks: reverting
-- change_column_null(:elastic_reindexing_subtasks, :elastic_task, false, "elastic_task")
   -> 0.0046s
-- current_schema()
   -> 0.0002s
== 20210421190157 RemoveElasticTaskNullConstraintFromElasticReindexingSubtasks: reverted (0.0109s)

How to Test

Setup Advanced Search integration using Elasticsearch (including creating the index and enabling the index for searching)

  • Navigate to Admin - Settings - Advanced Search
  • Scroll to Elasticsearch zero-downtime reindexing
  • Click the Trigger cluster reindexing button
  • Refresh the screen to see the reindexing process move along

Note: Once the reindexing is started, you can speed up the testing process by opening a rails console and initiating the ElasticClusterReindexingCronWorker manually

ElasticClusterReindexingCronWorker.perform_async
Logs from a test where I manually set the REINDEX_MAX_TOTAL_SLICES_RUNNING to 4
{"severity":"INFO","time":"2021-04-26T15:21:31.198Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:26315 from gitlab-development-20210426-1110 to gitlab-development-20210426-1121 started for slice 0."}
{"severity":"INFO","time":"2021-04-26T15:21:31.205Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:26327 from gitlab-development-20210426-1110 to gitlab-development-20210426-1121 started for slice 1."}
{"severity":"INFO","time":"2021-04-26T15:21:31.210Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:26340 from gitlab-development-20210426-1110 to gitlab-development-20210426-1121 started for slice 2."}
{"severity":"INFO","time":"2021-04-26T15:21:31.217Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:26352 from gitlab-development-20210426-1110 to gitlab-development-20210426-1121 started for slice 3."}
{"severity":"INFO","time":"2021-04-26T15:24:08.364Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:30305 from gitlab-development-issues-20210426-1110 to gitlab-development-issues-20210426-1121 started for slice 0."}
{"severity":"INFO","time":"2021-04-26T15:24:08.370Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:30335 from gitlab-development-issues-20210426-1110 to gitlab-development-issues-20210426-1121 started for slice 1."}
{"severity":"INFO","time":"2021-04-26T15:24:08.376Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:30371 from gitlab-development-issues-20210426-1110 to gitlab-development-issues-20210426-1121 started for slice 2."}
{"severity":"INFO","time":"2021-04-26T15:24:08.383Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:30383 from gitlab-development-notes-20210426-1110 to gitlab-development-notes-20210426-1121 started for slice 0."}
{"severity":"INFO","time":"2021-04-26T15:24:08.388Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:30397 from gitlab-development-notes-20210426-1110 to gitlab-development-notes-20210426-1121 started for slice 1."}
{"severity":"INFO","time":"2021-04-26T15:24:08.394Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:30410 from gitlab-development-notes-20210426-1110 to gitlab-development-notes-20210426-1121 started for slice 2."}
{"severity":"INFO","time":"2021-04-26T15:24:08.402Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:30422 from gitlab-development-20210426-1110 to gitlab-development-20210426-1121 started for slice 4."}
{"severity":"INFO","time":"2021-04-26T15:24:08.408Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:30434 from gitlab-development-20210426-1110 to gitlab-development-20210426-1121 started for slice 5."}
{"severity":"INFO","time":"2021-04-26T15:24:08.414Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:30448 from gitlab-development-20210426-1110 to gitlab-development-20210426-1121 started for slice 6."}
{"severity":"INFO","time":"2021-04-26T15:26:37.962Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:35898 from gitlab-development-issues-20210426-1110 to gitlab-development-issues-20210426-1121 started for slice 3."}
{"severity":"INFO","time":"2021-04-26T15:26:37.968Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:35931 from gitlab-development-issues-20210426-1110 to gitlab-development-issues-20210426-1121 started for slice 4."}
{"severity":"INFO","time":"2021-04-26T15:26:37.979Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:35964 from gitlab-development-issues-20210426-1110 to gitlab-development-issues-20210426-1121 started for slice 5."}
{"severity":"INFO","time":"2021-04-26T15:26:37.986Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:35980 from gitlab-development-notes-20210426-1110 to gitlab-development-notes-20210426-1121 started for slice 3."}
{"severity":"INFO","time":"2021-04-26T15:26:37.992Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:36013 from gitlab-development-notes-20210426-1110 to gitlab-development-notes-20210426-1121 started for slice 4."}
{"severity":"INFO","time":"2021-04-26T15:26:37.998Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:36043 from gitlab-development-notes-20210426-1110 to gitlab-development-notes-20210426-1121 started for slice 5."}
{"severity":"INFO","time":"2021-04-26T15:26:38.004Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:36062 from gitlab-development-20210426-1110 to gitlab-development-20210426-1121 started for slice 7."}
{"severity":"INFO","time":"2021-04-26T15:26:38.010Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:36088 from gitlab-development-20210426-1110 to gitlab-development-20210426-1121 started for slice 8."}
{"severity":"INFO","time":"2021-04-26T15:26:38.017Z","correlation_id":null,"message":"Reindex task T4Oq4b9rR8aZ060Ll3ZrXw:36121 from gitlab-development-20210426-1110 to gitlab-development-20210426-1121 started for slice 9."}

Screenshots (strongly suggested)

image

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Terri Chu

Merge request reports