Update code analyzers
What does this MR do and why?
This MR updates code analyzers for Advanced search in order to improve Code Search quality.
I've created a compact benchmarking script to compare GitLab's code search with IDE-like experience, specifically, with a tool called ripgrep that's used in VS Code. This enabled us to quantify the improvement and confirmed our suspision that the original plan of replacing code_search_analyzer
with code_analyzer
is not feasible because it produces a large amount of false-positives and generally performs worse than our current analyzer. Instead, the most promising results are shown by the newly discovered word_delimiter_graph
filter, which was hidden deep in Elasticsearch documentation.
Here are the benchmarking results:
Approach | Hit Score | False positives | Index size |
---|---|---|---|
GitLab 15.4 (current) | 53.80% | 0% | 100% |
Original plan (code_analyzer ) |
24.46% | 86.42% | 100% |
This MR | 91.19% | 4.5% | 87.37% (requires reindex) |
Please note that this benchmarking script has very few substring examples because this is not covered during this first iteration and most likely requires considerable storage increase.
Click to see raw script output
##### current #####
> Code Search Quality Comparison: GitLab vs Ripgrep
Ripgrep version: ripgrep 13.0.0
Project: qa-perf-testing/gitlabhq (8f9beefac3774b30e911fb00a68f4c7a5244cf27)
Total hit score: 53.80952%
Total false positive score: 0.0%
##### code_analyzer #####
> Code Search Quality Comparison: GitLab vs Ripgrep
Ripgrep version: ripgrep 13.0.0
Project: qa-perf-testing/gitlabhq (8f9beefac3774b30e911fb00a68f4c7a5244cf27)
Total hit score: 24.46609%
Total false positive score: 86.42857%
##### word_delimiter_graph #####
> Code Search Quality Comparison: GitLab vs Ripgrep
Ripgrep version: ripgrep 13.0.0
Project: qa-perf-testing/gitlabhq (8f9beefac3774b30e911fb00a68f4c7a5244cf27)
Total hit score: 91.19048%
Total false positive score: 4.5%
On top of that, the new approach also makes all our quarantined code search examples introduced in !92375 (merged) green.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #346914 (closed)