Skip to content

Advanced search: Remove prefixed asterisk for path filters

What does this MR do and why?

Background

We are seeing very long queries (many times around 6s) when doing code search with additional filter for path (example return path:lib/api).

The reason is that we use a wildcard to match the path: { wildcard: *<query>* }. The Elasticsearch docs warn against wildcard queries starting with * or ? as it has performance implications:

Fix

Remove the prefixed asterisk so that only the suffix of the path is a wildcard. Because we use the Path Hierarchy Tokenizer which splits the path according to the delimiter (/), the term can be anywhere after a slash in the path and does not have to the start of the file's path.

Examples

  • For query lib/api, results with the path */lib/api* will still be returned (including spec/lib/api for example).
  • For query ib/api, results with the path */lib/api* will not be returned but would have previously.
  • For query lib/ap, results with the path */lib/api* will still be returned (including spec/lib/api for example).

Result

Comparing the search profiles in production with the before and after, the after is 500 times faster in finding the matching paths.

Search profiler before: 20.5s

Search profile after: 43.9ms

How to set up and validate locally

  1. Ensure Elasticsearch is running
  2. Check out master
  3. Do a search with the path filter, e.g. /search?scope=blobs&search=path%3Aapi%2Fentities
  4. Check this branch out
  5. Do a search with the path filter and verify that results are as expected, e.g. /search?scope=blobs&search=path%3Aapi%2Fentities

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #390177 (closed)

Edited by Madelein van Niekerk

Merge request reports