Index user emails with an email tokenizer
What does this MR do and why?
Fixes a bug where email searches don't work for advanced user search.
Current behaviour
Users are not returned in search results when searching by email address. This was previously working before we moved user search to Advanced Search.
Expected behaviour
Searching with an email address should return the user with that address.
Fix
Use an email tokenizer for email fields so that an email is kept as a single token when searching. Use the UAX URL Email Tokenizer as suggested by @john-mason:
![](/-/project/278964/uploads/991c2cc3286f87094701f2dc3297ef0b/Screenshot_2023-01-31_at_10.14.37.png)
*This will require the user index to be reindexed*
Screenshots or screen recordings
Before
![](/-/project/278964/uploads/afa288223939de31f8124c3189a59866/Screenshot_2023-01-31_at_10.25.43.png)
After
![](/-/project/278964/uploads/b68f48a0627e6a5ff8896cf65d53e3bc/Screenshot_2023-01-31_at_10.26.03.png)
How to set up and validate locally
- Ensure elasticsearch is setup (if new indices need to be created first, use master and not this branch).
- Enable advanced user search:
Feature.enable(:advanced_user_search)
- Take note of the mapping for the email field:
curl http://localhost:9200/gitlab-development-users/_mapping
- Perform a user search with an email and note that there are no results. E.g.
curl http://localhost:9200/gitlab-development-users/_search --data '{"query": {"term": {"email": {"value": "seeded@user14.com"}}}}' --header "Content-Type: application/json" | jq '.hits'
- Run a reindex of the user index in a rails console:
Elastic::ReindexingTask.create!(targets: %w[User])
- Run the reindex worker 3 times to complete the reindex:
ElasticClusterReindexingCronWorker.new.perform
- Take note of the mapping for the email field:
curl http://localhost:9200/gitlab-development-users/_mapping
- it should now have"analyzer" : "email_analyzer"
. - Perform a user search with an email and see that there is now a result being returned. E.g.
curl http://localhost:9200/gitlab-development-users/_search --data '{"query": {"term": {"email": {"value": "seeded@user14.com"}}}}' --header "Content-Type: application/json" | jq '.hits'
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #389841 (closed)
Edited by Madelein van Niekerk