The performance of the Zoekt GraphQL API should be improved

Background

We've discovered that the new Zoekt GraphQL API is quite a bit slower compared to the legacy (original) implementation. We should profile and improve the performance of it.

Benchmark

I've taken the response from production for the search term zoekt and tested old vs new implementations:

❯ rails runner benchmark.rb
----------------------------- LOADING PREVIEWS
----------------------------- LOADING PAGES
Warming up --------------------------------------
                 old   244.000  i/100ms
                 new     1.000  i/100ms
Calculating -------------------------------------
                 old      2.370k (± 8.6%) i/s -     11.956k in   5.102144s
                 new      0.707  (± 0.0%) i/s -      4.000  in   5.658492s

Comparison:
                 old:     2370.0 i/s
                 new:        0.7 i/s - 3351.73x  slower
Click to see source
#!/usr/bin/env ruby

require 'benchmark/ips'

DATA = File.read("search_response.json")
RESPONSE = Gitlab::Search::Zoekt::Response.new(::Gitlab::Json.parse(DATA).with_indifferent_access)
PER_PAGE = Gitlab::SearchResults::DEFAULT_PER_PAGE
PAGE_LIMIT = 10

def old_logic(response, per_page: PER_PAGE, page_limit: PAGE_LIMIT)
  results = {}
  i = 0
  response.each_file do |file|
    project_id = file[:RepositoryID].to_i

    cont = file[:LineMatches].each do |match|
      current_page = i / per_page
      break false if current_page == page_limit

      results[current_page] ||= []
      results[current_page] << {
        project_id: project_id,
        content: [match[:Before], match[:Line], match[:After]].compact.map do |l|
          Base64.decode64(l)
        end.join("\n"),
        line: match[:LineNumber],
        path: file[:FileName]
      }
      i += 1
    end
    break unless cont
  end

  results
end

def new_logic(response)
  multi_match = Search::Zoekt::MultiMatch.new(nil)
  multi_match.zoekt_extract_result_pages_multi_match(response, PER_PAGE, PAGE_LIMIT)
end

Benchmark.ips do |x|
  x.report("old") { old_logic(RESPONSE) }
  x.report("new") { new_logic(RESPONSE) }

  x.compare!
end

search_response.json

Profile

With https://github.com/tmm1/stackprof

stackprof-cpu-zoekt.dump

flamegraph.html

stackprof__mode__cpu_

Proposal

We should make it much more performant. We probably won't be able to reach the same level of performance since we have to parse more data to get the same number of pages, but it should be at least reasonable execution time.

Edited by Dmitry Gruzd