Fix the scroll position in code search results

Review changes
Download
Patches
Plain diff

Dylan Griffith requested to merge 239386-fix-scroll-window-code-search into master Sep 23, 2020

Overview 18
Commits 2
Pipelines 8
Changes 4

What does this MR do?

The previous algorithm was a heuristic approach which looked for a term inside a highlighted fragment. The term in the fragment was then used again to find the matching line in the actual content. This ocassionally gave incorrect results where whatever happened to appear to be highlighted was also found earlier in the document written in a different way which was not intended to be a match.

The Elasticsearch highlighter supports setting number_of_fragments to 0 which means that the highlighted result will actual be the entire content itself rather than small fragments of content. This makes it much easier to figure out the line number since we can just loop through this to begin with and stop as soon as we find the opening highlight tag.

This does come at the cost that all docs are returned in the highlight section now which makes the Elasticsearch response payload approximately twice as large but it is the only correct way to do it that I could find.

An alternative described in the docs is to use boundary_scanner but this requires the Fast vector highlighter which in term requires us to set offsets for our index_options but this will use considerably more storage so we would like to avoid this if possible.

It's worth noting that this won't fix the fact that highlighting is not behaving properly in the below examples. For that I've created #254941 (closed) which is explaining a very similar problem to this.

Screenshots

Below shows examples with the specific files linked that produced the problem.

`buffer.c`

Before

After

`features.yml`

Before

After

Does this MR meet the acceptance criteria?

Conformity

Changelog entry
[-] Documentation (if required)
Code review guidelines
Merge request performance guidelines
Style guides
Database guides
Separation of EE specific content

Availability and Testing

Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
Tested in all supported browsers
Informed Infrastructure department of a default or new setting change, if applicable per definition of done

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

Label as security and @ mention @gitlab-com/gl-security/appsec
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
Security reports checked/validated by a reviewer from the AppSec team

#239386 (closed)

Edited Sep 24, 2020 by Dylan Griffith

Merge request reports

Assignee

Select assignees

Reviewers

Request review from

Time tracking