Skip to content
Snippets Groups Projects

Update code analyzers

Merged Dmitry Gruzd requested to merge 346914-search-separators into master
All threads resolved!

What does this MR do and why?

This MR updates code analyzers for Advanced search in order to improve Code Search quality.

I've created a compact benchmarking script to compare GitLab's code search with IDE-like experience, specifically, with a tool called ripgrep that's used in VS Code. This enabled us to quantify the improvement and confirmed our suspision that the original plan of replacing code_search_analyzer with code_analyzer is not feasible because it produces a large amount of false-positives and generally performs worse than our current analyzer. Instead, the most promising results are shown by the newly discovered word_delimiter_graph filter, which was hidden deep in Elasticsearch documentation.

Here are the benchmarking results:

Approach Hit Score False positives Index size
GitLab 15.4 (current) 53.80% 0% 100%
Original plan (code_analyzer) 24.46% 86.42% 100%
This MR 91.19% 4.5% 87.37% (requires reindex)

Please note that this benchmarking script has very few substring examples because this is not covered during this first iteration and most likely requires considerable storage increase.

Click to see raw script output
##### current #####
> Code Search Quality Comparison: GitLab vs Ripgrep
Ripgrep version: ripgrep 13.0.0
Project: qa-perf-testing/gitlabhq (8f9beefac3774b30e911fb00a68f4c7a5244cf27)
Total hit score: 53.80952%
Total false positive score: 0.0%
##### code_analyzer #####
> Code Search Quality Comparison: GitLab vs Ripgrep
Ripgrep version: ripgrep 13.0.0
Project: qa-perf-testing/gitlabhq (8f9beefac3774b30e911fb00a68f4c7a5244cf27)
Total hit score: 24.46609%
Total false positive score: 86.42857%
##### word_delimiter_graph #####
> Code Search Quality Comparison: GitLab vs Ripgrep
Ripgrep version: ripgrep 13.0.0
Project: qa-perf-testing/gitlabhq (8f9beefac3774b30e911fb00a68f4c7a5244cf27)
Total hit score: 91.19048%
Total false positive score: 4.5%

On top of that, the new approach also makes all our quarantined code search examples introduced in !92375 (merged) green.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #346914 (closed)

Edited by Dmitry Gruzd

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Dmitry Gruzd
  • John Mason removed review request for @john-mason

    removed review request for @john-mason

  • Dmitry Gruzd requested review from @john-mason

    requested review from @john-mason

  • Dmitry Gruzd mentioned in merge request !99254 (merged)

    mentioned in merge request !99254 (merged)

  • John Mason approved this merge request

    approved this merge request

  • John Mason requested review from @terrichu and removed review request for @john-mason

    requested review from @terrichu and removed review request for @john-mason

  • 👋 @john-mason, thanks for approving this merge request.

    This is the first time the merge request is approved. To ensure full test coverage, a new pipeline will be started shortly.

    For more info, please refer to the following links:

  • 🤖 GitLab Bot 🤖 added 1 deleted label

    added 1 deleted label

  • Terri Chu
  • Terri Chu approved this merge request

    approved this merge request

  • Terri Chu resolved all threads

    resolved all threads

  • Terri Chu enabled an automatic merge when the pipeline for e53be327 succeeds

    enabled an automatic merge when the pipeline for e53be327 succeeds

  • Suggested Reviewers (beta)

    The individuals below may be good candidates to participate in the review based on various factors.

    You can use slash commands in comments to quickly assign /assign_reviewer @user1.

    Suggested Reviewers
    @rspeicher, @DylanGriffith, @ahegyi, @mkozono, @nick.thomas

    If you do not believe these suggestions are useful, please apply the label Bad Suggested Reviewer. You can also provide feedback for this feature on this issue: https://gitlab.com/gitlab-org/gitlab/-/issues/357923.

    Automatically generated by Suggested Reviewers Bot - an experimental ML-based recommendation engine created by ~"group::applied ml".

  • Dylan Griffith mentioned in issue #376051

    mentioned in issue #376051

  • merged

  • Terri Chu mentioned in commit 9613246f

    mentioned in commit 9613246f

  • added workflowstaging label and removed workflowcanary label

  • mentioned in issue #350851 (closed)

  • mentioned in issue #378897 (closed)

  • mentioned in issue #325234 (closed)

  • Changzheng Liu mentioned in merge request !127175 (merged)

    mentioned in merge request !127175 (merged)

  • mentioned in issue #323662 (closed)

  • Changzheng Liu mentioned in issue #423246

    mentioned in issue #423246

  • Dmitry Gruzd mentioned in issue #442450

    mentioned in issue #442450

  • mentioned in issue #442774

  • Please register or sign in to reply
    Loading