The performance of the Zoekt GraphQL API should be improved
Background
We've discovered that the new Zoekt GraphQL API is quite a bit slower compared to the legacy (original) implementation. We should profile and improve the performance of it.
Benchmark
I've taken the response from production for the search term zoekt
and tested old vs new implementations:
❯ rails runner benchmark.rb
----------------------------- LOADING PREVIEWS
----------------------------- LOADING PAGES
Warming up --------------------------------------
old 244.000 i/100ms
new 1.000 i/100ms
Calculating -------------------------------------
old 2.370k (± 8.6%) i/s - 11.956k in 5.102144s
new 0.707 (± 0.0%) i/s - 4.000 in 5.658492s
Comparison:
old: 2370.0 i/s
new: 0.7 i/s - 3351.73x slower
Click to see source
#!/usr/bin/env ruby
require 'benchmark/ips'
DATA = File.read("search_response.json")
RESPONSE = Gitlab::Search::Zoekt::Response.new(::Gitlab::Json.parse(DATA).with_indifferent_access)
PER_PAGE = Gitlab::SearchResults::DEFAULT_PER_PAGE
PAGE_LIMIT = 10
def old_logic(response, per_page: PER_PAGE, page_limit: PAGE_LIMIT)
results = {}
i = 0
response.each_file do |file|
project_id = file[:RepositoryID].to_i
cont = file[:LineMatches].each do |match|
current_page = i / per_page
break false if current_page == page_limit
results[current_page] ||= []
results[current_page] << {
project_id: project_id,
content: [match[:Before], match[:Line], match[:After]].compact.map do |l|
Base64.decode64(l)
end.join("\n"),
line: match[:LineNumber],
path: file[:FileName]
}
i += 1
end
break unless cont
end
results
end
def new_logic(response)
multi_match = Search::Zoekt::MultiMatch.new(nil)
multi_match.zoekt_extract_result_pages_multi_match(response, PER_PAGE, PAGE_LIMIT)
end
Benchmark.ips do |x|
x.report("old") { old_logic(RESPONSE) }
x.report("new") { new_logic(RESPONSE) }
x.compare!
end
Profile
With https://github.com/tmm1/stackprof
Proposal
We should make it much more performant. We probably won't be able to reach the same level of performance since we have to parse more data to get the same number of pages, but it should be at least reasonable execution time.
Edited by Dmitry Gruzd