[Reintroduce] Add fork and archive filters to search API
What does this MR do and why?
This MR reinstates changes from Zoekt search API should handle fork/archived fi... (#493281) which was reverted due to incident gitlab-com/gl-infra/production#20935 (closed)
Root cause of incident
The incident was opened for timeouts on blobs (code) search. The timeouts occurred for both advanced and exact code search. The original MR added an additional association to the scope used for preloading results (with_api_commit_entity_associations). This method was shared by both commits and blobs searches.
Changes introduced
This MR includes
- reverting the revert
- putting the new filters behind a derisk feature flag
- creating a new preload method specifically for
blobsthat does not contain the new association
References
Original issue + MR
- Zoekt search API should handle fork/archived fi... (#493281)
- Add fork and archive filters to search API (!211973 - merged)
Revert issue + MR
- Reinstate reverted MR: Add fork and archive fil... (#583048)
- Revert "Merge branch '493281-add-fork-archived-... (!215382 - merged)
also Advanced search API should handle archived filters (#493282)
Screenshots or screen recordings
| Before | After |
|---|---|
How to set up and validate locally
To test with basic, advanced, or exact code search. You can check out gdk setup for setting up zoekt and elasticsearch.
Setup a group with 1 non-archived and 1 archived project and 1 forked project that has some data (at least issue and code). Run searches for global, group, and project level searches.
| level | search type | what filters should work |
|---|---|---|
| global | basic | include_archived |
| group | basic | include_archived |
| project | basic | NONE - include_archived should be ignored when searching in an archived project |
| global | advanced | include_archived |
| group | advanced | include_archived |
| project | advanced | NONE - include_archived should be ignored when searching in an archived project |
| global (blobs scope only) | zoekt |
include_archived, exclude_forks
|
| group (blobs scope only) | zoekt |
include_archived, exclude_forks
|
| project (blobs scope only) | zoekt | NONE - should be ignored when searching in an archived or forked project |
I used duo to help me write a test script (also did some local verification manually, but it's a lot to do by hand)
NOTE: I had to turn off API limits for searches for authenticated users to run the script
Click to expand script
#!/usr/bin/env ruby
require 'net/http'
require 'json'
require 'uri'
API_KEY = ENV['GDK_API_KEY']
BASE_URL = 'https://gdk.test:3443'
# Project paths
GROUP_PATH = 126 # group-with-archived-and-forked
ARCHIVED_PROJECT = 29 # group-with-archived-and-forked/archived
FORKED_PROJECT = 30 # group-with-archived-and-forked/gitlab-test-forked
NORMAL_PROJECT = 28 # group-with-archived-and-forked/non-archived
if API_KEY.nil? || API_KEY.empty?
puts "Error: GDK_API_KEY environment variable not set"
exit 1
end
class SearchTester
def initialize
@results = []
end
def test_all
puts "Testing search filters..."
puts "=" * 80
# Issues scope tests (basic/advanced at global/group/project)
test_issues_scope
# Blobs scope tests (zoekt at global/group/project)
test_blobs_scope
print_summary
end
private
def test_issues_scope
['basic', 'advanced'].each do |search_type|
# Global level
test_search(
level: 'global',
search_type: search_type,
scope: 'issues',
search_term: 'test 2025-12-05',
filter_tests: [:no_filter, :include_archived_true, :include_archived_false]
)
# Group level
test_search(
level: 'group',
search_type: search_type,
scope: 'issues',
search_term: 'test 2025-12-05',
group_id: GROUP_PATH,
filter_tests: [:no_filter, :include_archived_true, :include_archived_false]
)
# Project level (archived)
test_search(
level: 'project (archived)',
search_type: search_type,
scope: 'issues',
search_term: 'test 2025-12-05',
project_id: ARCHIVED_PROJECT,
filter_tests: [:no_filter, :include_archived_true, :include_archived_false]
)
# Project level (normal)
test_search(
level: 'project (normal)',
search_type: search_type,
scope: 'issues',
search_term: 'test 2025-12-05',
project_id: NORMAL_PROJECT,
filter_tests: [:no_filter, :include_archived_true, :include_archived_false]
)
end
end
def test_blobs_scope
test_term = '"Suggestions for a good README"'
# Global level
test_search(
level: 'global',
search_type: 'zoekt',
scope: 'blobs',
search_term: test_term,
filter_tests: [
:no_filter,
:include_archived_true,
:include_archived_false,
:exclude_forks_true,
:exclude_forks_false,
:both_filters
]
)
# Group level
test_search(
level: 'group',
search_type: 'zoekt',
scope: 'blobs',
search_term: test_term,
group_id: GROUP_PATH,
filter_tests: [
:no_filter,
:include_archived_true,
:include_archived_false,
:exclude_forks_true,
:exclude_forks_false,
:both_filters
]
)
# Project level (archived)
test_search(
level: 'project (archived)',
search_type: 'zoekt',
scope: 'blobs',
search_term: test_term,
project_id: ARCHIVED_PROJECT,
filter_tests: [:no_filter, :include_archived_true, :exclude_forks_true]
)
# Project level (forked)
test_search(
level: 'project (forked)',
search_type: 'zoekt',
scope: 'blobs',
search_term: test_term,
project_id: FORKED_PROJECT,
filter_tests: [:no_filter, :include_archived_true, :exclude_forks_true]
)
# Project level (normal)
test_search(
level: 'project (normal)',
search_type: 'zoekt',
scope: 'blobs',
search_term: test_term,
project_id: NORMAL_PROJECT,
filter_tests: [:no_filter, :include_archived_true, :exclude_forks_true]
)
end
def test_search(level:, search_type:, scope:, search_term:, filter_tests:, group_id: nil, project_id: nil)
filter_tests.each do |filter_test|
params = build_params(search_type, scope, search_term, filter_test, group_id, project_id)
response = make_request(params)
status = response.code.to_i
result_count = parse_result_count(response, scope)
expected = expected_result(level, scope, filter_test, project_id)
passed = evaluate_result(status, result_count, expected, response.body, scope)
@results << {
level: level,
search_type: search_type,
scope: scope,
filter: filter_description(filter_test),
status: status,
count: result_count,
expected: expected,
passed: passed
}
print '.'
end
end
def build_params(search_type, scope, search_term, filter_test, group_id, project_id)
params = {
search_type: search_type,
scope: scope,
search: search_term
}
params[:group_id] = group_id if group_id
params[:project_id] = project_id if project_id
case filter_test
when :include_archived_true
params[:include_archived] = 'true'
when :include_archived_false
params[:include_archived] = 'false'
when :exclude_forks_true
params[:exclude_forks] = 'true'
when :exclude_forks_false
params[:exclude_forks] = 'false'
when :both_filters
params[:include_archived] = 'true'
params[:exclude_forks] = 'true'
end
params
end
def make_request(params)
uri = URI("#{BASE_URL}/api/v4/search")
uri.query = URI.encode_www_form(params)
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new(uri)
request['PRIVATE-TOKEN'] = API_KEY
http.request(request)
end
def parse_result_count(response, scope)
return 0 unless response.code.to_i == 200
body = JSON.parse(response.body)
body.is_a?(Array) ? body.length : 0
rescue JSON::ParserError
0
end
def expected_result(level, scope, filter_test, project_id)
# At project level, filters should be ignored - always expect results if data exists
if level.start_with?('project')
return 'results (filter ignored)'
end
# Global/Group level
case filter_test
when :no_filter
if scope == 'issues'
'no archived results'
else
'no archived, includes forks'
end
when :include_archived_true
'includes archived'
when :include_archived_false
'no archived results'
when :exclude_forks_true
'no forked results'
when :exclude_forks_false
'includes forks'
when :both_filters
'includes archived, no forks'
end
end
def evaluate_result(status, result_count, expected, response_body, scope)
# Always need 200
return false unless status == 200
# For project level, just need 200 (filters ignored)
return true if expected == 'results (filter ignored)'
# Parse the response to check actual project paths
begin
results = JSON.parse(response_body)
return true if results.empty? # Empty results are valid
# Extract project paths from results
has_archived = results.any? { |r|
path = r.dig('project', 'path_with_namespace') || r['path_with_namespace']
path&.include?('archived')
}
has_forked = results.any? { |r|
path = r.dig('project', 'path_with_namespace') || r['path_with_namespace']
path&.include?('forked')
}
# Validate based on expected behavior
case expected
when 'no archived results'
!has_archived
when 'includes archived'
true # We allow archived, don't require it
when 'no forked results'
!has_forked
when 'includes forks'
true # We allow forks, don't require them
when 'no archived, includes forks'
!has_archived
when 'includes archived, no forks'
!has_forked
else
true
end
rescue JSON::ParserError
false
end
end
def filter_description(filter_test)
case filter_test
when :no_filter then 'none'
when :include_archived_true then 'include_archived=true'
when :include_archived_false then 'include_archived=false'
when :exclude_forks_true then 'exclude_forks=true'
when :exclude_forks_false then 'exclude_forks=false'
when :both_filters then 'both filters=true'
end
end
def print_summary
puts "\n\n"
puts "=" * 120
puts "SUMMARY"
puts "=" * 120
printf "%-20s %-10s %-10s %-25s %-8s %-8s %-30s %-6s\n",
"Level", "Type", "Scope", "Filter", "Status", "Count", "Expected", "Result"
puts "-" * 120
@results.each do |r|
result_text = r[:passed] ? "✓ PASS" : "✗ FAIL"
printf "%-20s %-10s %-10s %-25s %-8s %-8s %-30s %-6s\n",
r[:level], r[:search_type], r[:scope], r[:filter],
r[:status], r[:count], r[:expected], result_text
end
puts "=" * 120
passed = @results.count { |r| r[:passed] }
total = @results.length
puts "\nTotal: #{passed}/#{total} passed"
end
end
tester = SearchTester.new
tester.test_all
========================================================================================================================
SUMMARY
========================================================================================================================
Level Type Scope Filter Status Count Expected Result
------------------------------------------------------------------------------------------------------------------------
global basic issues none 200 1 no archived results ✓ PASS
global basic issues include_archived=true 200 2 includes archived ✓ PASS
global basic issues include_archived=false 200 1 no archived results ✓ PASS
group basic issues none 200 1 no archived results ✓ PASS
group basic issues include_archived=true 200 2 includes archived ✓ PASS
group basic issues include_archived=false 200 1 no archived results ✓ PASS
project (archived) basic issues none 200 1 results (filter ignored) ✓ PASS
project (archived) basic issues include_archived=true 200 2 results (filter ignored) ✓ PASS
project (archived) basic issues include_archived=false 200 1 results (filter ignored) ✓ PASS
project (normal) basic issues none 200 1 results (filter ignored) ✓ PASS
project (normal) basic issues include_archived=true 200 2 results (filter ignored) ✓ PASS
project (normal) basic issues include_archived=false 200 1 results (filter ignored) ✓ PASS
global advanced issues none 200 0 no archived results ✓ PASS
global advanced issues include_archived=true 200 0 includes archived ✓ PASS
global advanced issues include_archived=false 200 0 no archived results ✓ PASS
group advanced issues none 200 0 no archived results ✓ PASS
group advanced issues include_archived=true 200 0 includes archived ✓ PASS
group advanced issues include_archived=false 200 0 no archived results ✓ PASS
project (archived) advanced issues none 200 0 results (filter ignored) ✓ PASS
project (archived) advanced issues include_archived=true 200 0 results (filter ignored) ✓ PASS
project (archived) advanced issues include_archived=false 200 0 results (filter ignored) ✓ PASS
project (normal) advanced issues none 200 0 results (filter ignored) ✓ PASS
project (normal) advanced issues include_archived=true 200 0 results (filter ignored) ✓ PASS
project (normal) advanced issues include_archived=false 200 0 results (filter ignored) ✓ PASS
global zoekt blobs none 200 10 no archived, includes forks ✓ PASS
global zoekt blobs include_archived=true 200 11 includes archived ✓ PASS
global zoekt blobs include_archived=false 200 10 no archived results ✓ PASS
global zoekt blobs exclude_forks=true 200 10 no forked results ✓ PASS
global zoekt blobs exclude_forks=false 200 11 includes forks ✓ PASS
global zoekt blobs both filters=true 200 11 includes archived, no forks ✓ PASS
group zoekt blobs none 200 10 no archived, includes forks ✓ PASS
group zoekt blobs include_archived=true 200 11 includes archived ✓ PASS
group zoekt blobs include_archived=false 200 10 no archived results ✓ PASS
group zoekt blobs exclude_forks=true 200 10 no forked results ✓ PASS
group zoekt blobs exclude_forks=false 200 11 includes forks ✓ PASS
group zoekt blobs both filters=true 200 11 includes archived, no forks ✓ PASS
project (archived) zoekt blobs none 200 10 results (filter ignored) ✓ PASS
project (archived) zoekt blobs include_archived=true 200 11 results (filter ignored) ✓ PASS
project (archived) zoekt blobs exclude_forks=true 200 10 results (filter ignored) ✓ PASS
project (forked) zoekt blobs none 200 10 results (filter ignored) ✓ PASS
project (forked) zoekt blobs include_archived=true 200 11 results (filter ignored) ✓ PASS
project (forked) zoekt blobs exclude_forks=true 200 10 results (filter ignored) ✓ PASS
project (normal) zoekt blobs none 200 10 results (filter ignored) ✓ PASS
project (normal) zoekt blobs include_archived=true 200 11 results (filter ignored) ✓ PASS
project (normal) zoekt blobs exclude_forks=true 200 10 results (filter ignored) ✓ PASS
========================================================================================================================
Total: 45/45 passed
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #583048