Skip to content

Use prefix search instead of ngrams for sha fields

Dmitry Gruzd requested to merge 27789-sha-prefix-search into master

What does this MR do?

Currently, SHAs are indexed using ngrams from 5 to 40 characters. This means that each SHA is split into 35 separate terms taking up a lot of storage. SHAs are quite unique from 4-5 characters on, so a simple prefix search will be sufficiently fast and as effective as ngrams with term matching.

This MR replaces current ngrams analyzers with prefix search.

Original issue

Testing different options on a project from gitlabhq_export.tar.gz

Options Size, MB %
ngrams 899.1 100.00%
prefix search 788.67 87.71% -12.29%

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by 🤖 GitLab Bot 🤖

Merge request reports