Search results inconsistent for camel case text

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Summary

Code search results (using Advanced search) are returning different results for camel case text.

Customer support ticket (internal): https://gitlab.zendesk.com/agent/tickets/487811

Steps to reproduce

  1. enable Advanced search in gdk for indexing and search
  2. index everything
    bundle exec rake gitlab:elastic:index
  3. find a project to add files to
  4. add five files:
    • filename: test1.java, content: public void helloWorldFooBar(Boolean test) throws Exception;
    • filename: test2.java, content: testService.helloWorldFooBar(True);
    • filename: test1.md, content: helloWorldFooBar
    • filename: test2.md, content: helloworldfoobar
    • filename: test3.md, content: helloWorldFooBar(False)

What is the current bug behavior?

  1. Perform a project code search (verify that Advanced search is enabled)
  2. Search with helloWorldFooBar
  3. Get back 5 results image
  4. Search with helloworldfoobar
  5. Get back 2 results image

What is the expected correct behavior?

Results should work consistently for camel case text regardless of case sensitivity in search term

Possible fixes

Code content is indexed using the Elasticsearch Word delimeter graph token filter and appears to only be an issue when the text is followed by a non-ASCII character

Edited by 🤖 GitLab Bot 🤖