Skip to content

Index user emails with an email tokenizer

Madelein van Niekerk requested to merge 389841-bug-fix-elastic-email-search into master

What does this MR do and why?

Fixes a bug where email searches don't work for advanced user search.

Current behaviour

Users are not returned in search results when searching by email address. This was previously working before we moved user search to Advanced Search.

Expected behaviour

Searching with an email address should return the user with that address.

Fix

Use an email tokenizer for email fields so that an email is kept as a single token when searching. Use the UAX URL Email Tokenizer as suggested by @john-mason:

*This will require the user index to be reindexed*

Screenshots or screen recordings

Before

After

How to set up and validate locally

  1. Ensure elasticsearch is setup (if new indices need to be created first, use master and not this branch).
  2. Enable advanced user search: Feature.enable(:advanced_user_search)
  3. Take note of the mapping for the email field: curl http://localhost:9200/gitlab-development-users/_mapping
  4. Perform a user search with an email and note that there are no results. E.g. curl http://localhost:9200/gitlab-development-users/_search --data '{"query": {"term": {"email": {"value": "seeded@user14.com"}}}}' --header "Content-Type: application/json" | jq '.hits'
  5. Run a reindex of the user index in a rails console: Elastic::ReindexingTask.create!(targets: %w[User])
  6. Run the reindex worker 3 times to complete the reindex: ElasticClusterReindexingCronWorker.new.perform
  7. Take note of the mapping for the email field: curl http://localhost:9200/gitlab-development-users/_mapping - it should now have "analyzer" : "email_analyzer".
  8. Perform a user search with an email and see that there is now a result being returned. E.g. curl http://localhost:9200/gitlab-development-users/_search --data '{"query": {"term": {"email": {"value": "seeded@user14.com"}}}}' --header "Content-Type: application/json" | jq '.hits'

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #389841 (closed)

Edited by Madelein van Niekerk

Merge request reports