Skip to content

Update code analyzers

Dmitry Gruzd requested to merge 346914-search-separators into master

What does this MR do and why?

This MR updates code analyzers for Advanced search in order to improve Code Search quality.

I've created a compact benchmarking script to compare GitLab's code search with IDE-like experience, specifically, with a tool called ripgrep that's used in VS Code. This enabled us to quantify the improvement and confirmed our suspision that the original plan of replacing code_search_analyzer with code_analyzer is not feasible because it produces a large amount of false-positives and generally performs worse than our current analyzer. Instead, the most promising results are shown by the newly discovered word_delimiter_graph filter, which was hidden deep in Elasticsearch documentation.

Here are the benchmarking results:

Approach Hit Score False positives Index size
GitLab 15.4 (current) 53.80% 0% 100%
Original plan (code_analyzer) 24.46% 86.42% 100%
This MR 91.19% 4.5% 87.37% (requires reindex)

Please note that this benchmarking script has very few substring examples because this is not covered during this first iteration and most likely requires considerable storage increase.

Click to see raw script output
##### current #####
> Code Search Quality Comparison: GitLab vs Ripgrep
Ripgrep version: ripgrep 13.0.0
Project: qa-perf-testing/gitlabhq (8f9beefac3774b30e911fb00a68f4c7a5244cf27)
Total hit score: 53.80952%
Total false positive score: 0.0%
##### code_analyzer #####
> Code Search Quality Comparison: GitLab vs Ripgrep
Ripgrep version: ripgrep 13.0.0
Project: qa-perf-testing/gitlabhq (8f9beefac3774b30e911fb00a68f4c7a5244cf27)
Total hit score: 24.46609%
Total false positive score: 86.42857%
##### word_delimiter_graph #####
> Code Search Quality Comparison: GitLab vs Ripgrep
Ripgrep version: ripgrep 13.0.0
Project: qa-perf-testing/gitlabhq (8f9beefac3774b30e911fb00a68f4c7a5244cf27)
Total hit score: 91.19048%
Total false positive score: 4.5%

On top of that, the new approach also makes all our quarantined code search examples introduced in !92375 (merged) green.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #346914 (closed)

Edited by Dmitry Gruzd

Merge request reports