Improve ClusterReindexingService states and logging

Description

We have different problems with Elasticsearch (ES) migrations using MigrationReindexTaskHelper, and there was not enough information and logs to understand the issues, eg. !189044 (comment 2475476273) or !190508 (comment 2504271245).

After a discussion hour with the Global Search group members, we decided to improve the ClusterReindexingService class because the current reindex task doesn't reflect the system's real states.

ReindexingTask States

Current states

  • Initial - indexing is paused here
  • Indexing paused - tasks kicked off to ES
  • Reindexing - monitoring, check for success at the end

New proposed states

  • Preflight check – anything before changes are made, errors can be returned to the user
  • Pause ES indexing
  • Set the index as read-only
  • Kick off reindexing tasks to ES
  • Reindexing monitoring
  • Check for success

Implementation

We will make significant changes with new states, but don't want to break existing or new reindexing tasks. One possible approach is to version the reindexing task records using a new constant similar to Elasticsearch's SCHEMA_VERSION. We save this constant in the task itself, for example, we can use ReindexingTask#options. If the version doesn't match what we have, we can mark the task as failed. It's not a problem when a reindexing fails because the system will return to the original index, and we can show a nice message to the user: "The reindexing task started in a previous version of the GitLab code, please retry it".

Edited by 🤖 GitLab Bot 🤖