Add trie-indexed pattern matching for CODEOWNERS
What does this MR do and why?
Contributes to #594073 (closed)
Problem
Gitlab::CodeOwners::File#entries_for_path uses a brute-force
O(P × S × E) algorithm, calling File.fnmatch? for every
pattern in every section for each changed file path. For
GitLab's own CODEOWNERS (2,127 patterns across 82 sections),
a merge request touching 500 files triggers over 1 million
fnmatch calls, taking ~500ms.
Solution
Introduce Gitlab::CodeOwners::PatternIndex, which builds a
per-section directory-trie index from parsed CODEOWNERS data.
97% of patterns are anchored directory/file prefixes that can
be resolved via O(depth) trie walks instead of linear scans.
Only the ~3% of patterns with glob characters in their prefix
fall back to fnmatch.
Benchmarked against the real GitLab CODEOWNERS file and 99K repository paths: fnmatch calls reduced by 96.8%, wall-clock speedup of 3.4× (698ms → 208ms for 500 paths). One-time index build cost is ~4ms, amortized after ~12 path lookups.
How the PatternIndex optimization works
Imagine the CODEOWNERS file with 2,000+ rules like:
/app/models/ @backend-team
/app/views/ @frontend-team
/ee/lib/gitlab/ @ee-team
*.rb @ruby-ownerBefore (brute force): For every changed file in your MR, we open the CODEOWNERS and check every single rule to see if it matches. Change 500 files? That's 500 × 2,000 = 1,000,000 checks.
After (trie index): We build a lookup tree from the folder structure of the rules, once:
(root)
├── app/
│ ├── models/ → @backend-team
│ └── views/ → @frontend-team
└── ee/
└── lib/
└── gitlab/ → @ee-teamNow to check app/models/user.rb, we just walk root → app → models (2 lookups) and immediately find the matching rule. We never even look at the 1,998 other rules.
The ~3% of rules that use wildcards (like *.rb) can't be organized into folders, so those still get checked the old way - but that's 63 checks instead of 2,000.
Result: 97% fewer checks, 3.4× faster.
Benchmark
Results against the real .gitlab/CODEOWNERS
(82 sections, 2,127 patterns) and real repository paths
(99,491 files):
| Paths | Brute-force (per path) | PatternIndex (per path) | Speedup |
|---|---|---|---|
| 10 | 2.566ms | 1.409ms | 1.8× |
| 50 | 0.725ms | 0.387ms | 1.9× |
| 200 | 0.520ms | 0.168ms | 3.1× |
| 500 | 0.472ms | 0.139ms | 3.4× |
Correctness verified for all path counts — both approaches produce identical results.
Benchmark script and full output
Save as tmp/benchmark_codeowners_real.rb and run with
bundle exec ruby tmp/benchmark_codeowners_real.rb from the
project root:
#!/usr/bin/env ruby
# frozen_string_literal: true
require_relative '../config/environment'
require 'benchmark'
puts "=" * 70
puts "CODEOWNERS Pattern Matching Benchmark (real classes)"
puts "Brute-force vs PatternIndex"
puts "=" * 70
puts
codeowners_content = File.read(Rails.root.join('.gitlab', 'CODEOWNERS'))
BenchmarkBlob = Struct.new(:data, :path, :repository) do
def binary?
false
end
end
def make_file(content)
blob = BenchmarkBlob.new(content, 'CODEOWNERS', nil)
Gitlab::CodeOwners::File.new(blob)
end
file = make_file(codeowners_content)
total_sections = file.parsed_data.size
total_patterns = file.parsed_data.values.sum { |s| s.size }
puts "Sections: #{total_sections}"
puts "Total pattern entries: #{total_patterns}"
puts
paths_all = `git ls-files --full-name`.lines.map(&:chomp).reject(&:empty?)
puts "Total files in repo: #{paths_all.size}"
srand(42)
[10, 50, 200, 500].each do |path_count|
sampled = paths_all.sample(path_count)
puts
puts "-" * 70
puts "#{path_count} paths x #{total_sections} sections x #{total_patterns} patterns"
puts "-" * 70
file_check = make_file(codeowners_content)
mismatches = 0
sampled.each do |p|
bf_count = file_check.send(:entries_for_path_brute_force, p).size
pi_count = file_check.send(:pattern_index).entries_for_path(p).size
if bf_count != pi_count
mismatches += 1
puts " MISMATCH: #{p} -- bf=#{bf_count}, pi=#{pi_count}" if mismatches <= 3
end
end
puts mismatches > 0 ? "Correctness: FAILED (#{mismatches} mismatches)" : "Correctness: PASSED"
iterations = 3
Benchmark.bm(22) do |x|
x.report("brute-force") do
Feature.disable(:optimized_codeowners_pattern_matching)
iterations.times do
f = make_file(codeowners_content)
sampled.each { |p| f.entries_for_path(p) }
end
end
x.report("pattern-index (trie)") do
Feature.enable(:optimized_codeowners_pattern_matching)
iterations.times do
f = make_file(codeowners_content)
sampled.each { |p| f.entries_for_path(p) }
end
end
end
Feature.disable(:optimized_codeowners_pattern_matching)
bf_time = Benchmark.measure do
iterations.times do
f = make_file(codeowners_content)
sampled.each { |p| f.entries_for_path(p) }
end
end
Feature.enable(:optimized_codeowners_pattern_matching)
pi_time = Benchmark.measure do
iterations.times do
f = make_file(codeowners_content)
sampled.each { |p| f.entries_for_path(p) }
end
end
bf_per_path = (bf_time.real / (iterations * path_count) * 1000).round(3)
pi_per_path = (pi_time.real / (iterations * path_count) * 1000).round(3)
speedup = (bf_time.real / pi_time.real).round(1)
puts
puts "Per-path avg: brute-force=#{bf_per_path}ms, pattern-index=#{pi_per_path}ms"
puts "Speedup: #{speedup}x"
end
Feature.disable(:optimized_codeowners_pattern_matching)
puts
puts "=" * 70
puts "Done"
puts "=" * 70Full output:
======================================================================
CODEOWNERS Pattern Matching Benchmark (real classes)
Brute-force vs PatternIndex
======================================================================
Sections: 82
Total pattern entries: 2127
Total files in repo: 99491
----------------------------------------------------------------------
10 paths x 82 sections x 2127 patterns
----------------------------------------------------------------------
Correctness: PASSED
user system total real
brute-force 0.098039 0.026772 0.124811 ( 0.125673)
pattern-index (trie) 0.047715 0.009455 0.057170 ( 0.056699)
Per-path avg: brute-force=2.566ms, pattern-index=1.409ms
Speedup: 1.8x
----------------------------------------------------------------------
50 paths x 82 sections x 2127 patterns
----------------------------------------------------------------------
Correctness: PASSED
user system total real
brute-force 0.110403 0.008118 0.118521 ( 0.118273)
pattern-index (trie) 0.054229 0.001329 0.055558 ( 0.055411)
Per-path avg: brute-force=0.725ms, pattern-index=0.387ms
Speedup: 1.9x
----------------------------------------------------------------------
200 paths x 82 sections x 2127 patterns
----------------------------------------------------------------------
Correctness: PASSED
user system total real
brute-force 0.298528 0.001745 0.300273 ( 0.299960)
pattern-index (trie) 0.101122 0.006638 0.107760 ( 0.107573)
Per-path avg: brute-force=0.52ms, pattern-index=0.168ms
Speedup: 3.1x
----------------------------------------------------------------------
500 paths x 82 sections x 2127 patterns
----------------------------------------------------------------------
Correctness: PASSED
user system total real
brute-force 0.693829 0.004603 0.698432 ( 0.698119)
pattern-index (trie) 0.203061 0.005080 0.208141 ( 0.207814)
Per-path avg: brute-force=0.472ms, pattern-index=0.139ms
Speedup: 3.4x
======================================================================
Done
======================================================================References
- Feature flag:
optimized_codeowners_pattern_matching(gitlab_com_derisk, disabled by default) - Feature issue: #594073 (closed)
- Rollout issue: #601529
Screenshots or screen recordings
Not applicable (backend-only change, no UI).
How to set up and validate locally
- Enable the feature flag:
Feature.enable(:optimized_codeowners_pattern_matching) - Open any merge request with CODEOWNERS approval rules
- Verify that the approval rules resolve correctly
- Run the benchmark script above to confirm performance improvement
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.