Skip to content

Elastic migration to reindex user index

Madelein van Niekerk requested to merge 389841-migration-reindex-user into master

What does this MR do and why?

Triggers reindexing of the User index as per this comment thread.

Creates two Elastic migrations:

  1. Removes fields that were removed from the user index in Fixes N+1 queries when users are indexed into e... (!110915 - merged)
  2. Reindex the user index

In Index user emails with an email tokenizer (!110614 - merged) we added an analyzer and tokenizer for emails. In order for Elastic to index the documents using this new analyzer, we need to create the tokenizer first. This requires a new index to be build and the easiest way to do that is with zero-downtime reindexing which creates a new index and then moves documents from the old to the new index.

Removing old fields

Created a new migration helper which checks for documents containing the fields and updates them in batches by using an update_by_query. Also updated the documentation with instructions for the new helper.

Reindexing

An Elastic migration so that self-managed customers can also benefit from the new analyzer.

The migration creates a Elastic::ReindexingTask record which will be processed by ElasticClusterReindexingCronWorker.

Caveat: because Elastic::ClusterReindexingService has a guard to fail if there are any pending migrations and this reindexing is kicked off by a migration, we need to immediately mark the migration as completed after creating the task otherwise ClusterReindexingService will fail reindexing.

Logs

Click to expand

Screenshot_2023-02-08_at_15.05.33

How to set up and validate locally

  1. Ensure Elasticsearch is running
  2. Find a user index that was created before 2023-02-02 or create a new index using master before that day. This is to ensure the state of the index is the same as what is currently in production with two_factor_enabled and has_projects fields and mappings.
  3. Checkout this branch
  4. Get the current index for the gitlab-development-users alias: curl "http://localhost:9200/_cat/aliases/gitlab-development-users?h=i"
  5. In a rails console run Elastic::MigrationWorker.new.perform a few times.
  6. Run ElasticClusterReindexingCronWorker.new.perform 4 times.
  7. Run Elastic::MigrationWorker.new.perform a few times.
  8. Verify that the first migration has completed == true: curl "http://localhost:9200/gitlab-development-migrations/_doc/20230208090000" | jq '."_source"'
  9. Verify that the second migration has completed == true: curl "http://localhost:9200/gitlab-development-migrations/_doc/20230208100000" | jq '."_source"'
  10. Check that the mappings for two_factor_enabled and has_projects doesn't exist anymore: curl "http://localhost:9200/gitlab-development-users/_mapping"
  11. Get the new index for the gitlab-development-users alias: curl "http://localhost:9200/_cat/aliases/gitlab-development-users?h=i"
  12. Verify that the index name is different from step 2.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #389841 (closed)

Edited by Madelein van Niekerk

Merge request reports