Skip to content

Remove partial word matching from code search

Dmitry Gruzd requested to merge 27918-prefix-code-search into master

What does this MR do?

Currently the code search uses ngrams to allow searching for prefixes as well as full matches. This takes up a lot of storage and can be replaced with a prefix search.

This MR removes the usage of edgeNGram_filter from our index mappings.

#27918 (closed)

Index size

Mappings Size, MB %
current 524.77 100.00%
without edgeNGram_filter 176.55 33.64% -66.36%

Testing with the Performance toolkit

There's 7.3% performance improvement in p90 for the new mappings

Before

* Environment:                Localhost
* Environment Version:        13.1.0-pre `5e066ffdf76`
* Option:                     60s_20rps
* Date:                       2020-05-27
* Run Time:                   1m 1.5s (Start: 07:02:03 UTC, End: 07:03:05 UTC)
* GPT Version:                v1.3.1

NAME                                 | RPS  | RPS RESULT        | TTFB AVG | TTFB P90            | REQ STATUS     | RESULT
-------------------------------------|------|-------------------|----------|---------------------|----------------|----------------
api_v4_projects_project_search_blobs | 20/s | 18.67/s (>1.60/s) | 226.78ms | 233.48ms (<15000ms) | 100.00% (>95%) | Passed

After

* Environment:                Localhost
* Environment Version:        13.1.0-pre `5e066ffdf76`
* Option:                     60s_20rps
* Date:                       2020-05-27
* Run Time:                   1m 1.37s (Start: 07:06:20 UTC, End: 07:07:21 UTC)
* GPT Version:                v1.3.1

NAME                                 | RPS  | RPS RESULT       | TTFB AVG | TTFB P90            | REQ STATUS     | RESULT
-------------------------------------|------|------------------|----------|---------------------|----------------|----------------
api_v4_projects_project_search_blobs | 20/s | 19.5/s (>1.60/s) | 157.64ms | 201.86ms (<15000ms) | 100.00% (>95%) | Passed

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by 🤖 GitLab Bot 🤖

Merge request reports