[Reintroduce] Add fork and archive filters to search API

What does this MR do and why?

This MR reinstates changes from Zoekt search API should handle fork/archived fi... (#493281) which was reverted due to incident gitlab-com/gl-infra/production#20935 (closed)

Root cause of incident

The incident was opened for timeouts on blobs (code) search. The timeouts occurred for both advanced and exact code search. The original MR added an additional association to the scope used for preloading results (with_api_commit_entity_associations). This method was shared by both commits and blobs searches.

Changes introduced

This MR includes

  • reverting the revert
  • putting the new filters behind a derisk feature flag
  • creating a new preload method specifically for blobs that does not contain the new association

References

Original issue + MR

Revert issue + MR

also Advanced search API should handle archived filters (#493282)

Screenshots or screen recordings

Before After

How to set up and validate locally

To test with basic, advanced, or exact code search. You can check out gdk setup for setting up zoekt and elasticsearch.

Setup a group with 1 non-archived and 1 archived project and 1 forked project that has some data (at least issue and code). Run searches for global, group, and project level searches.

level search type what filters should work
global basic include_archived
group basic include_archived
project basic NONE - include_archived should be ignored when searching in an archived project
global advanced include_archived
group advanced include_archived
project advanced NONE - include_archived should be ignored when searching in an archived project
global (blobs scope only) zoekt include_archived, exclude_forks
group (blobs scope only) zoekt include_archived, exclude_forks
project (blobs scope only) zoekt NONE - should be ignored when searching in an archived or forked project

I used duo to help me write a test script (also did some local verification manually, but it's a lot to do by hand)

NOTE: I had to turn off API limits for searches for authenticated users to run the script

Click to expand script
#!/usr/bin/env ruby

require 'net/http'
require 'json'
require 'uri'

API_KEY = ENV['GDK_API_KEY']
BASE_URL = 'https://gdk.test:3443'

# Project paths
GROUP_PATH = 126    # group-with-archived-and-forked
ARCHIVED_PROJECT = 29 # group-with-archived-and-forked/archived
FORKED_PROJECT =  30  # group-with-archived-and-forked/gitlab-test-forked
NORMAL_PROJECT =  28  # group-with-archived-and-forked/non-archived

if API_KEY.nil? || API_KEY.empty?
  puts "Error: GDK_API_KEY environment variable not set"
  exit 1
end

class SearchTester
  def initialize
    @results = []
  end

  def test_all
    puts "Testing search filters..."
    puts "=" * 80
    
    # Issues scope tests (basic/advanced at global/group/project)
    test_issues_scope
    
    # Blobs scope tests (zoekt at global/group/project)
    test_blobs_scope
    
    print_summary
  end

  private

  def test_issues_scope
    ['basic', 'advanced'].each do |search_type|
      # Global level
      test_search(
        level: 'global',
        search_type: search_type,
        scope: 'issues',
        search_term: 'test 2025-12-05',
        filter_tests: [:no_filter, :include_archived_true, :include_archived_false]
      )
      
      # Group level
      test_search(
        level: 'group',
        search_type: search_type,
        scope: 'issues',
        search_term: 'test 2025-12-05',
        group_id: GROUP_PATH,
        filter_tests: [:no_filter, :include_archived_true, :include_archived_false]
      )
      
      # Project level (archived)
      test_search(
        level: 'project (archived)',
        search_type: search_type,
        scope: 'issues',
        search_term: 'test 2025-12-05',
        project_id: ARCHIVED_PROJECT,
        filter_tests: [:no_filter, :include_archived_true, :include_archived_false]
      )
      
      # Project level (normal)
      test_search(
        level: 'project (normal)',
        search_type: search_type,
        scope: 'issues',
        search_term: 'test 2025-12-05',
        project_id: NORMAL_PROJECT,
        filter_tests: [:no_filter, :include_archived_true, :include_archived_false]
      )
    end
  end

  def test_blobs_scope
    test_term = '"Suggestions for a good README"'
    # Global level
    test_search(
      level: 'global',
      search_type: 'zoekt',
      scope: 'blobs',
      search_term: test_term,
      filter_tests: [
        :no_filter,
        :include_archived_true,
        :include_archived_false,
        :exclude_forks_true,
        :exclude_forks_false,
        :both_filters
      ]
    )
    
    # Group level
    test_search(
      level: 'group',
      search_type: 'zoekt',
      scope: 'blobs',
      search_term: test_term,
      group_id: GROUP_PATH,
      filter_tests: [
        :no_filter,
        :include_archived_true,
        :include_archived_false,
        :exclude_forks_true,
        :exclude_forks_false,
        :both_filters
      ]
    )
    
    # Project level (archived)
    test_search(
      level: 'project (archived)',
      search_type: 'zoekt',
      scope: 'blobs',
      search_term: test_term,
      project_id: ARCHIVED_PROJECT,
      filter_tests: [:no_filter, :include_archived_true, :exclude_forks_true]
    )
    
    # Project level (forked)
    test_search(
      level: 'project (forked)',
      search_type: 'zoekt',
      scope: 'blobs',
      search_term: test_term,
      project_id: FORKED_PROJECT,
      filter_tests: [:no_filter, :include_archived_true, :exclude_forks_true]
    )
    
    # Project level (normal)
    test_search(
      level: 'project (normal)',
      search_type: 'zoekt',
      scope: 'blobs',
      search_term: test_term,
      project_id: NORMAL_PROJECT,
      filter_tests: [:no_filter, :include_archived_true, :exclude_forks_true]
    )
  end

  def test_search(level:, search_type:, scope:, search_term:, filter_tests:, group_id: nil, project_id: nil)
    filter_tests.each do |filter_test|
      params = build_params(search_type, scope, search_term, filter_test, group_id, project_id)
      response = make_request(params)
      
      status = response.code.to_i
      result_count = parse_result_count(response, scope)
      
      expected = expected_result(level, scope, filter_test, project_id)
      passed = evaluate_result(status, result_count, expected, response.body, scope)
      
      @results << {
        level: level,
        search_type: search_type,
        scope: scope,
        filter: filter_description(filter_test),
        status: status,
        count: result_count,
        expected: expected,
        passed: passed
      }
      
      print '.'
    end
  end

  def build_params(search_type, scope, search_term, filter_test, group_id, project_id)
    params = {
      search_type: search_type,
      scope: scope,
      search: search_term
    }
    
    params[:group_id] = group_id if group_id
    params[:project_id] = project_id if project_id
    
    case filter_test
    when :include_archived_true
      params[:include_archived] = 'true'
    when :include_archived_false
      params[:include_archived] = 'false'
    when :exclude_forks_true
      params[:exclude_forks] = 'true'
    when :exclude_forks_false
      params[:exclude_forks] = 'false'
    when :both_filters
      params[:include_archived] = 'true'
      params[:exclude_forks] = 'true'
    end
    
    params
  end

  def make_request(params)
    uri = URI("#{BASE_URL}/api/v4/search")
    uri.query = URI.encode_www_form(params)
    
    http = Net::HTTP.new(uri.host, uri.port)
    http.use_ssl = true
    http.verify_mode = OpenSSL::SSL::VERIFY_NONE
    
    request = Net::HTTP::Get.new(uri)
    request['PRIVATE-TOKEN'] = API_KEY
    
    http.request(request)
  end

  def parse_result_count(response, scope)
    return 0 unless response.code.to_i == 200
    
    body = JSON.parse(response.body)
    body.is_a?(Array) ? body.length : 0
  rescue JSON::ParserError
    0
  end

  def expected_result(level, scope, filter_test, project_id)
    # At project level, filters should be ignored - always expect results if data exists
    if level.start_with?('project')
      return 'results (filter ignored)'
    end
    
    # Global/Group level
    case filter_test
    when :no_filter
      if scope == 'issues'
        'no archived results'
      else
        'no archived, includes forks'
      end
    when :include_archived_true
      'includes archived'
    when :include_archived_false
      'no archived results'
    when :exclude_forks_true
      'no forked results'
    when :exclude_forks_false
      'includes forks'
    when :both_filters
      'includes archived, no forks'
    end
  end

  def evaluate_result(status, result_count, expected, response_body, scope)
    # Always need 200
    return false unless status == 200
    
    # For project level, just need 200 (filters ignored)
    return true if expected == 'results (filter ignored)'
    
    # Parse the response to check actual project paths
    begin
      results = JSON.parse(response_body)
      return true if results.empty? # Empty results are valid
      
      # Extract project paths from results
      has_archived = results.any? { |r| 
        path = r.dig('project', 'path_with_namespace') || r['path_with_namespace']
        path&.include?('archived')
      }
      
      has_forked = results.any? { |r|
        path = r.dig('project', 'path_with_namespace') || r['path_with_namespace']
        path&.include?('forked')
      }
      
      # Validate based on expected behavior
      case expected
      when 'no archived results'
        !has_archived
      when 'includes archived'
        true # We allow archived, don't require it
      when 'no forked results'
        !has_forked
      when 'includes forks'
        true # We allow forks, don't require them
      when 'no archived, includes forks'
        !has_archived
      when 'includes archived, no forks'
        !has_forked
      else
        true
      end
    rescue JSON::ParserError
      false
    end
  end

  def filter_description(filter_test)
    case filter_test
    when :no_filter then 'none'
    when :include_archived_true then 'include_archived=true'
    when :include_archived_false then 'include_archived=false'
    when :exclude_forks_true then 'exclude_forks=true'
    when :exclude_forks_false then 'exclude_forks=false'
    when :both_filters then 'both filters=true'
    end
  end

  def print_summary
    puts "\n\n"
    puts "=" * 120
    puts "SUMMARY"
    puts "=" * 120
    
    printf "%-20s %-10s %-10s %-25s %-8s %-8s %-30s %-6s\n",
           "Level", "Type", "Scope", "Filter", "Status", "Count", "Expected", "Result"
    puts "-" * 120
    
    @results.each do |r|
      result_text = r[:passed] ? "✓ PASS" : "✗ FAIL"
      printf "%-20s %-10s %-10s %-25s %-8s %-8s %-30s %-6s\n",
             r[:level], r[:search_type], r[:scope], r[:filter],
             r[:status], r[:count], r[:expected], result_text
    end
    
    puts "=" * 120
    
    passed = @results.count { |r| r[:passed] }
    total = @results.length
    puts "\nTotal: #{passed}/#{total} passed"
  end
end

tester = SearchTester.new
tester.test_all
========================================================================================================================
SUMMARY
========================================================================================================================
Level                Type       Scope      Filter                    Status   Count    Expected                       Result
------------------------------------------------------------------------------------------------------------------------
global               basic      issues     none                      200      1        no archived results            ✓ PASS
global               basic      issues     include_archived=true     200      2        includes archived              ✓ PASS
global               basic      issues     include_archived=false    200      1        no archived results            ✓ PASS
group                basic      issues     none                      200      1        no archived results            ✓ PASS
group                basic      issues     include_archived=true     200      2        includes archived              ✓ PASS
group                basic      issues     include_archived=false    200      1        no archived results            ✓ PASS
project (archived)   basic      issues     none                      200      1        results (filter ignored)       ✓ PASS
project (archived)   basic      issues     include_archived=true     200      2        results (filter ignored)       ✓ PASS
project (archived)   basic      issues     include_archived=false    200      1        results (filter ignored)       ✓ PASS
project (normal)     basic      issues     none                      200      1        results (filter ignored)       ✓ PASS
project (normal)     basic      issues     include_archived=true     200      2        results (filter ignored)       ✓ PASS
project (normal)     basic      issues     include_archived=false    200      1        results (filter ignored)       ✓ PASS
global               advanced   issues     none                      200      0        no archived results            ✓ PASS
global               advanced   issues     include_archived=true     200      0        includes archived              ✓ PASS
global               advanced   issues     include_archived=false    200      0        no archived results            ✓ PASS
group                advanced   issues     none                      200      0        no archived results            ✓ PASS
group                advanced   issues     include_archived=true     200      0        includes archived              ✓ PASS
group                advanced   issues     include_archived=false    200      0        no archived results            ✓ PASS
project (archived)   advanced   issues     none                      200      0        results (filter ignored)       ✓ PASS
project (archived)   advanced   issues     include_archived=true     200      0        results (filter ignored)       ✓ PASS
project (archived)   advanced   issues     include_archived=false    200      0        results (filter ignored)       ✓ PASS
project (normal)     advanced   issues     none                      200      0        results (filter ignored)       ✓ PASS
project (normal)     advanced   issues     include_archived=true     200      0        results (filter ignored)       ✓ PASS
project (normal)     advanced   issues     include_archived=false    200      0        results (filter ignored)       ✓ PASS
global               zoekt      blobs      none                      200      10       no archived, includes forks    ✓ PASS
global               zoekt      blobs      include_archived=true     200      11       includes archived              ✓ PASS
global               zoekt      blobs      include_archived=false    200      10       no archived results            ✓ PASS
global               zoekt      blobs      exclude_forks=true        200      10       no forked results              ✓ PASS
global               zoekt      blobs      exclude_forks=false       200      11       includes forks                 ✓ PASS
global               zoekt      blobs      both filters=true         200      11       includes archived, no forks    ✓ PASS
group                zoekt      blobs      none                      200      10       no archived, includes forks    ✓ PASS
group                zoekt      blobs      include_archived=true     200      11       includes archived              ✓ PASS
group                zoekt      blobs      include_archived=false    200      10       no archived results            ✓ PASS
group                zoekt      blobs      exclude_forks=true        200      10       no forked results              ✓ PASS
group                zoekt      blobs      exclude_forks=false       200      11       includes forks                 ✓ PASS
group                zoekt      blobs      both filters=true         200      11       includes archived, no forks    ✓ PASS
project (archived)   zoekt      blobs      none                      200      10       results (filter ignored)       ✓ PASS
project (archived)   zoekt      blobs      include_archived=true     200      11       results (filter ignored)       ✓ PASS
project (archived)   zoekt      blobs      exclude_forks=true        200      10       results (filter ignored)       ✓ PASS
project (forked)     zoekt      blobs      none                      200      10       results (filter ignored)       ✓ PASS
project (forked)     zoekt      blobs      include_archived=true     200      11       results (filter ignored)       ✓ PASS
project (forked)     zoekt      blobs      exclude_forks=true        200      10       results (filter ignored)       ✓ PASS
project (normal)     zoekt      blobs      none                      200      10       results (filter ignored)       ✓ PASS
project (normal)     zoekt      blobs      include_archived=true     200      11       results (filter ignored)       ✓ PASS
project (normal)     zoekt      blobs      exclude_forks=true        200      10       results (filter ignored)       ✓ PASS
========================================================================================================================

Total: 45/45 passed

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #583048

Edited by Terri Chu

Merge request reports

Loading