ElasticSearch "Exact Term" broken when used with special characters.
Summary
According to the advanced search docs to do an exact term search, you use double quotes around the search string. This does not work when special characters are used.
In our case, we were trying to search for: "us-east-2". All of the files returned contain the string, but we do not see the match focused in the search UI. We tried escaping the - characters according to the docs, but no change.
Screenshots. Notice how "us-east-2" isn't shown in the File preview areas.
Examining the queries sent to elasticsearch revealed the issue. It appears that the ElasticSearch analyzer is splitting "us-east-2" into 3 separate strings - "us", "east", "2". So elasticsearch is returning the first instance of any of those strings as the "highlight" payload. In most of the above cases, the first match is "us".
Steps to reproduce
- Setup/enable elasticsearch.
- Create a file with the words "us" towards the top, and "us-east-2" at the bottom. They should be separated by many lines.
- re-index with the new file.
- Run the search for "us-east-2"
- Notice the above UI behavior for search.
What is the current bug behavior?
- Search UI does not focus the matched string. It instead focuses the first occurrence of any of the above words.
What is the expected correct behavior?
- Search should only match the full search string "us-east-2"
- Search UI should bring to focus the full string match.
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
System information System: CentOS 7.4.1708 Proxy: no Current User: git Using RVM: no Ruby Version: 2.3.5p376 Gem Version: 2.6.13 Bundler Version:1.13.7 Rake Version: 12.1.0 Redis Version: 3.2.5 Git Version: 2.13.6 Sidekiq Version:5.0.4 Go Version: unknown
GitLab information Version: 10.1.1-ee Revision: 5da7e03 Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: postgresql DB Version: 9.6.5 URL: HTTP Clone URL: /some-group/some-project.git SSH Clone URL: git@:some-group/some-project.git Elasticsearch: yes Geo: yes Geo node: Primary Using LDAP: yes Using Omniauth: no
GitLab Shell Version: 5.9.3 Repository storage paths:
- default: /gitlab/data/repositories Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab Shell ...GitLab Shell version >= 5.9.3 ? ... OK (5.9.3) Repo base directory exists? default... yes Repo storage directories are symlinks? default... no Repo paths owned by git:root, or git:git? default... yes Repo paths access is drwxrws---? default... yes hooks directories in repos are links: ... Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Check GitLab API access: OK Redis available via internal API: OK
Access to /var/opt/gitlab/.ssh/authorized_keys: OK gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Sidekiq ...
Running? ... yes Number of Sidekiq processes ... 1
Checking Sidekiq ... Finished
Reply by email is disabled in config/gitlab.yml Checking LDAP ...
Server: ldapmain not verifying SSL hostname of LDAPS server '' LDAP authentication... Success LDAP users with access to your GitLab server (only showing the first 100 results)
Checking LDAP ... Finished
Checking GitLab ...
Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... Redis version >= 2.8.0? ... yes Ruby version >= 2.3.3 ? ... yes (2.3.5) Git version >= 2.7.3 ? ... yes (2.13.6) Git user has default SSH configuration? ... no Try fixing it: mkdir ~/gitlab-check-backup-1510168972 sudo mv /var/opt/gitlab/.ssh/id_rsa ~/gitlab-check-backup-1510168972 sudo mv /var/opt/gitlab/.ssh/id_rsa.pub ~/gitlab-check-backup-1510168972 For more information see: doc/ssh/README.md in section "SSH on the GitLab server" Please fix the error above and rerun the checks. Active users: ... 345 Elasticsearch version 5.1 - 5.5? ... yes (5.4.1)
Checking GitLab ... Finished
Possible fixes
The code responsible for grabbing the first "highlight" match is here - https://gitlab.com/gitlab-org/gitlab-ee/blob/master/lib/gitlab/elastic/search_results.rb#L81

