Update code analyzers
What does this MR do and why?
This MR updates code analyzers for Advanced search in order to improve Code Search quality.
I've created a compact benchmarking script to compare GitLab's code search with IDE-like experience, specifically, with a tool called ripgrep that's used in VS Code. This enabled us to quantify the improvement and confirmed our suspision that the original plan of replacing code_search_analyzer
with code_analyzer
is not feasible because it produces a large amount of false-positives and generally performs worse than our current analyzer. Instead, the most promising results are shown by the newly discovered word_delimiter_graph
filter, which was hidden deep in Elasticsearch documentation.
Here are the benchmarking results:
Approach | Hit Score | False positives | Index size |
---|---|---|---|
GitLab 15.4 (current) | 53.80% | 0% | 100% |
Original plan (code_analyzer ) |
24.46% | 86.42% | 100% |
This MR | 91.19% | 4.5% | 87.37% (requires reindex) |
Please note that this benchmarking script has very few substring examples because this is not covered during this first iteration and most likely requires considerable storage increase.
Click to see raw script output
##### current #####
> Code Search Quality Comparison: GitLab vs Ripgrep
Ripgrep version: ripgrep 13.0.0
Project: qa-perf-testing/gitlabhq (8f9beefac3774b30e911fb00a68f4c7a5244cf27)
Total hit score: 53.80952%
Total false positive score: 0.0%
##### code_analyzer #####
> Code Search Quality Comparison: GitLab vs Ripgrep
Ripgrep version: ripgrep 13.0.0
Project: qa-perf-testing/gitlabhq (8f9beefac3774b30e911fb00a68f4c7a5244cf27)
Total hit score: 24.46609%
Total false positive score: 86.42857%
##### word_delimiter_graph #####
> Code Search Quality Comparison: GitLab vs Ripgrep
Ripgrep version: ripgrep 13.0.0
Project: qa-perf-testing/gitlabhq (8f9beefac3774b30e911fb00a68f4c7a5244cf27)
Total hit score: 91.19048%
Total false positive score: 4.5%
On top of that, the new approach also makes all our quarantined code search examples introduced in !92375 (merged) green.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #346914 (closed)
Merge request reports
Activity
changed milestone to %Backlog
assigned to @dgruzd
Suggested Reviewers (beta)
The individuals below may be good candidates to participate in the review based on various factors.
You can use slash commands in comments to quickly assign
/assign_reviewer @user1
.Suggested Reviewers @rspeicher
,@DylanGriffith
,@ahegyi
,@mkozono
,@nick.thomas
If you do not believe these suggestions are useful, please apply the label Bad Suggested Reviewer. You can also provide feedback for this feature on this issue:
https://gitlab.com/gitlab-org/gitlab/-/issues/357923
.Automatically generated by Suggested Reviewers Bot - an experimental ML-based recommendation engine created by ~"group::applied ml".
Edited by GitLab Reviewer-Recommender Bot- A deleted user
added backend label
Reviewer roulette
Changes that require review have been detected!
Please refer to the table below for assigning reviewers and maintainers suggested by Danger in the specified category:
Category Reviewer Maintainer backend Drew Blessing ( @dblessing
) (UTC-5, 7 hours behind@dgruzd
)Dylan Griffith ( @DylanGriffith
) (UTC-5, 7 hours behind@dgruzd
)To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot, based on their timezone. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.
To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.
Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.
If needed, you can retry the
🔁 danger-review
job that generated this comment.Generated by
🚫 Danger- Resolved by Terri Chu
👋 @dgruzd - please see the following guidance and update this merge request.1 Warning ⚠ Please add a subtype label to this merge request. If you have added a type label and do not feel the purpose of this merge request matches one of the subtypes labels, please resolve this discussion.
removed workflowblocked label
removed [deprecated] Accepting merge requests label
added 2904 commits
-
ee4c4b4d...d849dbd4 - 2902 commits from branch
master
- 5492b769 - Update code analyzers
- 7a353312 - fixup! Update code analyzers
-
ee4c4b4d...d849dbd4 - 2902 commits from branch
- A deleted user
added feature flag label
added 2 commits
Allure report
allure-report-publisher
generated test report!e2e-package-and-test:
✅ test report for 08c73c93expand test summary
+---------------------------------------------------------------------------+ | suites summary | +----------------------+--------+--------+---------+-------+-------+--------+ | | passed | failed | skipped | flaky | total | result | +----------------------+--------+--------+---------+-------+-------+--------+ | Create | 320 | 0 | 10 | 0 | 330 | ✅ | | Package | 0 | 0 | 6 | 0 | 6 | ➖ | | Manage | 198 | 0 | 8 | 0 | 206 | ✅ | | Verify | 86 | 0 | 16 | 0 | 102 | ✅ | | Plan | 114 | 0 | 0 | 0 | 114 | ✅ | | Secure | 44 | 0 | 2 | 0 | 46 | ✅ | | Analytics | 4 | 0 | 0 | 0 | 4 | ✅ | | Fulfillment | 4 | 0 | 24 | 0 | 28 | ✅ | | Release | 8 | 0 | 0 | 0 | 8 | ✅ | | Configure | 0 | 0 | 6 | 0 | 6 | ➖ | | Version sanity check | 0 | 0 | 2 | 0 | 2 | ➖ | | Protect | 4 | 0 | 0 | 0 | 4 | ✅ | +----------------------+--------+--------+---------+-------+-------+--------+ | Total | 782 | 0 | 74 | 0 | 856 | ✅ | +----------------------+--------+--------+---------+-------+-------+--------+
added 274 commits
-
b2c169b3...55c4b10c - 267 commits from branch
master
- f3a747c6 - Update code analyzers
- eb25d205 - fixup! Update code analyzers
- 5bd1ebfc - fixup! Update code analyzers
- 833a8a6d - fixup! Update code analyzers
- c22da269 - fixup! Update code analyzers
- f6590e04 - fixup! Update code analyzers
- 08c73c93 - fixup! Update code analyzers
Toggle commit list-
b2c169b3...55c4b10c - 267 commits from branch
changed milestone to %15.5
added 285 commits
-
bbbe1bff...5722ae69 - 276 commits from branch
master
- 180b9a10 - Update code analyzers
- c672b990 - fixup! Update code analyzers
- 1007c658 - fixup! Update code analyzers
- 7d4fa8de - fixup! Update code analyzers
- 03777f9c - fixup! Update code analyzers
- 799b288a - fixup! Update code analyzers
- fa48da75 - fixup! Update code analyzers
- 1c7f7ad8 - fixup! Update code analyzers
- 8ee6687f - fixup! Update code analyzers
Toggle commit list-
bbbe1bff...5722ae69 - 276 commits from branch
mentioned in issue #346914 (closed)
added 74 commits
-
f59f4b84...9c6e95db - 64 commits from branch
master
- b610c1ba - Update code analyzers
- 580488f9 - fixup! Update code analyzers
- 813cdc7f - fixup! Update code analyzers
- 1c72295c - fixup! Update code analyzers
- efb5887b - fixup! Update code analyzers
- 9c79564f - fixup! Update code analyzers
- 4bf13f86 - fixup! Update code analyzers
- 22c4dc14 - fixup! Update code analyzers
- e34c711c - fixup! Update code analyzers
- fad86ac2 - fixup! Update code analyzers
Toggle commit list-
f59f4b84...9c6e95db - 64 commits from branch
marked the checklist item I have evaluated the MR acceptance checklist for this MR. as completed
added 290 commits
-
72cc231e...be249203 - 289 commits from branch
master
- b701a8e6 - Update code analyzers
-
72cc231e...be249203 - 289 commits from branch
mentioned in issue gitlab-com/gl-infra/production#6116 (closed)
- Resolved by Terri Chu
@john-mason could I ask you to review this MR and assign it to
@terrichu
if everything's ok? Thank you!
- Resolved by Terri Chu
- Resolved by Terri Chu
- Resolved by Terri Chu
@DylanGriffith Continuing the discussion from #346914 (comment 1117872300). I've just tested your theory against https://gitlab.com/gitlab-org/quality/performance-data/-/raw/master/projects_export/gitlabhq_export.tar.gz?inline=false
If I use this branch and search
authorized_project_ids_relation
, it only returns one result even though the repo has a lot of tokens likeauthorized
,project_ids
,relation
. I think it's because we use"default_operator": "and"
insimple_query_string
added workflowin review label
mentioned in merge request !99254 (merged)
mentioned in issue gitlab-com/gl-infra/production#7813 (moved)
requested review from @terrichu and removed review request for @john-mason
👋 @john-mason
, thanks for approving this merge request.This is the first time the merge request is approved. To ensure full test coverage, a new pipeline will be started shortly.
For more info, please refer to the following links:
added bugfunctional label
- Resolved by Dmitry Gruzd
enabled an automatic merge when the pipeline for e53be327 succeeds
Suggested Reviewers (beta)
The individuals below may be good candidates to participate in the review based on various factors.
You can use slash commands in comments to quickly assign
/assign_reviewer @user1
.Suggested Reviewers @rspeicher
,@DylanGriffith
,@ahegyi
,@mkozono
,@nick.thomas
If you do not believe these suggestions are useful, please apply the label Bad Suggested Reviewer. You can also provide feedback for this feature on this issue:
https://gitlab.com/gitlab-org/gitlab/-/issues/357923
.Automatically generated by Suggested Reviewers Bot - an experimental ML-based recommendation engine created by ~"group::applied ml".
mentioned in issue #376051
mentioned in commit 9613246f
added workflowstaging-canary label and removed workflowin review label
added workflowcanary label and removed workflowstaging-canary label
added workflowstaging label and removed workflowcanary label
added workflowproduction label and removed workflowstaging label
mentioned in issue #350851 (closed)
added workflowpost-deploy-db-production label and removed workflowproduction label
mentioned in merge request gitlab-elasticsearch-indexer!163 (merged)
mentioned in issue gitlab-com/gl-infra/production#7879 (closed)
added releasedcandidate label
mentioned in merge request kubitus-project/kubitus-installer!1521 (merged)
mentioned in issue #378897 (closed)
mentioned in issue gitlab-org/search-team/team-tasks#100 (closed)
mentioned in issue #325234 (closed)
added releasedpublished label and removed releasedcandidate label
mentioned in merge request !127175 (merged)
mentioned in issue #323662 (closed)
mentioned in issue #423246
mentioned in issue #442450
mentioned in issue #442774