Fix zoekt match count
What does this MR do and why?
Change zoekt multi-match graphQL response to use match_count instead of ngram_match_count. NGram match count could contain duplicate matches for the same actual match (I believe due to matching on multiple ngrams)
The MR also contains a small refactor of code and some specs changes.
When testing this, I found that matches on the same line are counted as 1 match from the Zoekt side. Zoekt API and UI have the same behavior.
Zoekt
example
zoekt API request
post http://127.0.0.1:6090/api/search
{
"Q": "maintainers",
"Opts": { "TotalMaxMatchCount": 5000, "NumContextLines": 1 },
"RepoIDs": [2]
}
zoekt API response
{
"Result": {
"ContentBytesLoaded": 9086,
"IndexBytesLoaded": 102,
"Crashes": 0,
"Duration": 99500,
"FileCount": 1,
"ShardFilesConsidered": 0,
"FilesConsidered": 1,
"FilesLoaded": 1,
"FilesSkipped": 0,
"ShardsScanned": 1,
"ShardsSkipped": 0,
"ShardsSkippedFilter": 0,
"MatchCount": 1,
"NgramMatches": 2,
"NgramLookups": 128,
"Wait": 8500,
"MatchTreeConstruction": 45625,
"MatchTreeSearch": 17791,
"RegexpsConsidered": 0,
"FlushReason": 0,
"Files": [
{
"FileName": "PROCESS.md",
"Repository": "2",
"Version": "ddd0f15ae83993f5cb66a927a28673882e99100b",
"Language": "Markdown",
"Branches": [
"HEAD"
],
"LineMatches": [
{
"Line": "QmVsb3cgd2UgZGVzY3JpYmUgdGhlIGNvbnRyaWJ1dGluZyBwcm9jZXNzIHRvIEdpdExhYiBmb3IgdHdvIHJlYXNvbnMuIFNvIHRoYXQgY29udHJpYnV0b3JzIGtub3cgd2hhdCB0byBleHBlY3QgZnJvbSBtYWludGFpbmVycyAocG9zc2libGUgcmVzcG9uc2VzLCBmcmllbmRseSB0cmVhdG1lbnQsIGV0Yy4pLiBBbmQgc28gdGhhdCBtYWludGFpbmVycyBrbm93IHdoYXQgdG8gZXhwZWN0IGZyb20gY29udHJpYnV0b3JzICh1c2UgdGhlIGxhdGVzdCB2ZXJzaW9uLCBlbnN1cmUgdGhhdCB0aGUgaXNzdWUgaXMgYWRkcmVzc2VkLCBmcmllbmRseSB0cmVhdG1lbnQsIGV0Yy4pLgo=",
"LineStart": 82,
"LineEnd": 408,
"LineNumber": 5,
"Before": "Cg==",
"After": "Cg==",
"FileName": false,
"Score": 501,
"DebugScore": "",
"LineFragments": [
{
"LineOffset": 116,
"Offset": 198,
"MatchLength": 11,
"SymbolInfo": null
},
{
"LineOffset": 188,
"Offset": 270,
"MatchLength": 11,
"SymbolInfo": null
}
]
}
],
"Checksum": "4j/0Byr+RBs=",
"Score": 508.5365853658537,
"RepositoryID": 2
}
],
"RepoURLs": {
"2": ""
},
"LineFragments": {
"2": ""
}
}
}
Advanced search
Advanced search shows 1 result for the same search as well
References
Please include cross links to any resources that are relevant to this MR. This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.
- Related to #509938 (closed)
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
N/A
How to set up and validate locally
- enable zoekt
- turn off the ff for multi match
zoekt_multimatch_frontend - perform group and project code searches for some terms
- perform the same group and project searches using graphQL
- ensure the match counts are the same
group search
{
blobSearch(search: "use.*egex",
groupId: "gid://gitlab/Group/24") {
matchCount
perPage
fileCount
searchType
searchLevel
files {
path
fileUrl
chunks {
matchCountInChunk
lines {
lineNumber
text
richText
}
}
}
}
}
project search
{
blobSearch(search: "use.*egex",
projectId: "gid://gitlab/Project/2") {
matchCount
perPage
fileCount
searchType
searchLevel
files {
path
fileUrl
chunks {
matchCountInChunk
lines {
lineNumber
text
richText
}
}
}
}
}

