Skip to content

Fix the scroll position in code search results

Dylan Griffith requested to merge 239386-fix-scroll-window-code-search into master

What does this MR do?

The previous algorithm was a heuristic approach which looked for a term inside a highlighted fragment. The term in the fragment was then used again to find the matching line in the actual content. This ocassionally gave incorrect results where whatever happened to appear to be highlighted was also found earlier in the document written in a different way which was not intended to be a match.

The Elasticsearch highlighter supports setting number_of_fragments to 0 which means that the highlighted result will actual be the entire content itself rather than small fragments of content. This makes it much easier to figure out the line number since we can just loop through this to begin with and stop as soon as we find the opening highlight tag.

This does come at the cost that all docs are returned in the highlight section now which makes the Elasticsearch response payload approximately twice as large but it is the only correct way to do it that I could find.

An alternative described in the docs is to use boundary_scanner but this requires the Fast vector highlighter which in term requires us to set offsets for our index_options but this will use considerably more storage so we would like to avoid this if possible.

It's worth noting that this won't fix the fact that highlighting is not behaving properly in the below examples. For that I've created #254941 (closed) which is explaining a very similar problem to this.

Screenshots

Below shows examples with the specific files linked that produced the problem.

buffer.c

Before

Screen_Shot_2020-09-23_at_4.54.04_pm

After

Screen_Shot_2020-09-23_at_5.03.00_pm

features.yml

Before

Screen_Shot_2020-09-23_at_4.52.57_pm

After

Screen_Shot_2020-09-23_at_5.03.21_pm

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

#239386 (closed)

Edited by Dylan Griffith

Merge request reports