Elastic migration to reindex user index
What does this MR do and why?
Triggers reindexing of the User index as per this comment thread.
Creates two Elastic migrations:
- Removes fields that were removed from the user index in Fixes N+1 queries when users are indexed into e... (!110915 - merged)
- Reindex the user index
In Index user emails with an email tokenizer (!110614 - merged) we added an analyzer and tokenizer for emails. In order for Elastic to index the documents using this new analyzer, we need to create the tokenizer first. This requires a new index to be build and the easiest way to do that is with zero-downtime reindexing which creates a new index and then moves documents from the old to the new index.
Removing old fields
Created a new migration helper which checks for documents containing the fields and updates them in batches by using an update_by_query
. Also updated the documentation with instructions for the new helper.
Reindexing
An Elastic migration so that self-managed customers can also benefit from the new analyzer.
The migration creates a Elastic::ReindexingTask
record which will be processed by ElasticClusterReindexingCronWorker
.
Caveat: because Elastic::ClusterReindexingService
has a guard to fail if there are any pending migrations and this reindexing is kicked off by a migration, we need to immediately mark the migration as completed after creating the task otherwise ClusterReindexingService
will fail reindexing.
Logs
How to set up and validate locally
- Ensure Elasticsearch is running
- Find a user index that was created before 2023-02-02 or create a new index using master before that day. This is to ensure the state of the index is the same as what is currently in production with
two_factor_enabled
andhas_projects
fields and mappings. - Checkout this branch
- Get the current index for the
gitlab-development-users
alias:curl "http://localhost:9200/_cat/aliases/gitlab-development-users?h=i"
- In a rails console run
Elastic::MigrationWorker.new.perform
a few times. - Run
ElasticClusterReindexingCronWorker.new.perform
4 times. - Run
Elastic::MigrationWorker.new.perform
a few times. - Verify that the first migration has
completed == true
:curl "http://localhost:9200/gitlab-development-migrations/_doc/20230208090000" | jq '."_source"'
- Verify that the second migration has
completed == true
:curl "http://localhost:9200/gitlab-development-migrations/_doc/20230208100000" | jq '."_source"'
- Check that the mappings for
two_factor_enabled
andhas_projects
doesn't exist anymore:curl "http://localhost:9200/gitlab-development-users/_mapping"
- Get the new index for the
gitlab-development-users
alias:curl "http://localhost:9200/_cat/aliases/gitlab-development-users?h=i"
- Verify that the index name is different from step 2.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #389841 (closed)