Improve CI rules:exists pattern matching performance with cache
What does this MR do and why?
In rules:exists
, if a file path is a pattern (including *
), the path
pattern is iterated through every single file in the repository via
File.fnmatch?
. This is an expensive operation so it is limited to
MAX_PATTERN_COMPARISONS
(10k for now) per pattern.
We want to increase this limit (#227632 (closed)) but we need to improve (#351593 (closed)) the performance problem first.
Here, we are adding a request-store caching per glob. With this, a glob is iterated only once. We can accept the fact that files do not change during a request or a worker processing.
This is behind a feature flag ci_rules_exists_pattern_matches_cache
(#457914 (closed)).
Related to #351593 (closed)
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
The project has these files;
config/app.rb
hello1.yml
hello2.yml
and this CI config;
job1:
script: echo Hello, World!
rules:
- exists:
- 'docs/*.md' # does not match
- 'config/*.rb' # matches
job2:
script: echo Hello, World!
rules:
- exists:
- 'docs/*.md' # does not match
- '**/app.rb' # matches
job3:
script: echo Hello, World!
rules:
- exists:
- 'config/*.yml' # does not match
- '**/app.rb' # matches
job4:
script: echo Hello, World!
rules:
- exists:
- '**/app.rb' # matches
When the feature flag is enabled:
-
File.fnmatch?
runs fordocs/*.md
3 times (the number of project files) because it iterates all files once. -
File.fnmatch?
runs forconfig/*.rb
once because it iterates once and finds the file. -
File.fnmatch?
runs forconfig/*.yml
3 times (the number of project files) because it iterates all files once. -
File.fnmatch?
runs for**/app.rb
once because it iterates once and finds the file.
When the feature flag is not enabled:
-
File.fnmatch?
runs fordocs/*.md
6 times (2 x the number of project files) because it iterates all files twice. -
File.fnmatch?
runs forconfig/*.rb
once because it iterates once and finds the file. -
File.fnmatch?
runs forconfig/*.yml
3 times (the number of project files) because it iterates all files once. -
File.fnmatch?
runs for**/app.rb
3 times because it iterates once and finds the file three times.
Benchmark
# GITALY_DISABLE_REQUEST_LIMITS=true rails console
require 'benchmark'
ActiveRecord::Base.logger = nil
@project = Project.find(4) # test project with ~16k files
@paths = @project.repository.ls_files('master')
@glob = 'tests/test-*.c' # matches almost at the end of the list
def run_without_cache
@paths.any? do |path|
File.fnmatch?(@glob, path, File::FNM_PATHNAME | File::FNM_DOTMATCH | File::FNM_EXTGLOB)
end
end
def run_with_cache
Gitlab::SafeRequestStore.fetch("test123123_#{@glob}") do
@paths.any? do |path|
File.fnmatch?(@glob, path, File::FNM_PATHNAME | File::FNM_DOTMATCH | File::FNM_EXTGLOB)
end
end
end
::Gitlab::SafeRequestStore.ensure_request_store do
times_to_run = 1000
Benchmark.bm do |x|
x.report('without cache') { times_to_run.times { run_without_cache } }
x.report('with cache') { times_to_run.times { run_with_cache } }
end
end; nil
user system total real
without cache 5.867844 0.011283 5.879127 ( 5.897253)
with cache 0.010458 0.000036 0.010494 ( 0.010554)
Merge request reports
Activity
changed milestone to %17.0
assigned to @furkanayhan
- A deleted user
added feature flag feature flagexists labels
- Resolved by Furkan Ayhan
1 Message CHANGELOG missing: If this merge request needs a changelog entry, add the
Changelog
trailer to the commit message you want to add to the changelog.If this merge request doesn't need a CHANGELOG entry, feel free to ignore this message.
Reviewer roulette
Category Reviewer Maintainer backend @syasonik
(UTC-4, 6 hours behind author)
@mhamda
(UTC+2, same timezone as author)
~"Verify" Reviewer review is optional for ~"Verify" @pedropombeiro
(UTC+2, same timezone as author)
Please check reviewer's status!
Please refer to documentation page for guidance on how you can benefit from the Reviewer Roulette, or use the GitLab Review Workload Dashboard to find other available reviewers.
If needed, you can retry the
danger-review
job that generated this comment.Generated by
Dangeradded 1 commit
- ba01515a - Improve CI rules:exists pattern matching performance with cache
- Resolved by Marius Bobin
@drew Would you like to do the first review of this? This is kind of related to #450687 (which you were interested) and the new "ci platform" group. Then, maybe we can pass it to
@mbobin
for the final review.
requested review from @drew
mentioned in issue #351593 (closed)
- Resolved by drew stachon
- Resolved by Furkan Ayhan
- Resolved by drew stachon
removed review request for @drew
requested review from @rshambhuni
added 1 commit
- 2433a827 - Change FF actor and store key to project instead of pipeline
requested review from @drew