Skip to content

Improve CI rules:exists pattern matching performance with cache

Furkan Ayhan requested to merge 351593-cache-rules-exists into master

What does this MR do and why?

In rules:exists, if a file path is a pattern (including *), the path pattern is iterated through every single file in the repository via File.fnmatch?. This is an expensive operation so it is limited to MAX_PATTERN_COMPARISONS (10k for now) per pattern.

We want to increase this limit (#227632) but we need to improve (#351593 (closed)) the performance problem first.

Here, we are adding a request-store caching per glob. With this, a glob is iterated only once. We can accept the fact that files do not change during a request or a worker processing.

This is behind a feature flag ci_rules_exists_pattern_matches_cache (#457914 (closed)).

Related to #351593 (closed)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

The project has these files;

  • config/app.rb
  • hello1.yml
  • hello2.yml

and this CI config;

job1:
  script: echo Hello, World!
  rules:
    - exists:
      - 'docs/*.md' # does not match
      - 'config/*.rb' # matches

job2:
  script: echo Hello, World!
  rules:
    - exists:
      - 'docs/*.md' # does not match
      - '**/app.rb' # matches

job3:
  script: echo Hello, World!
  rules:
    - exists:
      - 'config/*.yml' # does not match
      - '**/app.rb' # matches

job4:
  script: echo Hello, World!
  rules:
    - exists:
      - '**/app.rb' # matches

When the feature flag is enabled:

  • File.fnmatch? runs for docs/*.md 3 times (the number of project files) because it iterates all files once.
  • File.fnmatch? runs for config/*.rb once because it iterates once and finds the file.
  • File.fnmatch? runs for config/*.yml 3 times (the number of project files) because it iterates all files once.
  • File.fnmatch? runs for **/app.rb once because it iterates once and finds the file.

When the feature flag is not enabled:

  • File.fnmatch? runs for docs/*.md 6 times (2 x the number of project files) because it iterates all files twice.
  • File.fnmatch? runs for config/*.rb once because it iterates once and finds the file.
  • File.fnmatch? runs for config/*.yml 3 times (the number of project files) because it iterates all files once.
  • File.fnmatch? runs for **/app.rb 3 times because it iterates once and finds the file three times.

Benchmark

# GITALY_DISABLE_REQUEST_LIMITS=true rails console

require 'benchmark'

ActiveRecord::Base.logger = nil

@project = Project.find(4) # test project with ~16k files
@paths = @project.repository.ls_files('master')
@glob = 'tests/test-*.c' # matches almost at the end of the list

def run_without_cache
  @paths.any? do |path|
    File.fnmatch?(@glob, path, File::FNM_PATHNAME | File::FNM_DOTMATCH | File::FNM_EXTGLOB)
  end
end

def run_with_cache
  Gitlab::SafeRequestStore.fetch("test123123_#{@glob}") do
    @paths.any? do |path|
      File.fnmatch?(@glob, path, File::FNM_PATHNAME | File::FNM_DOTMATCH | File::FNM_EXTGLOB)
    end
  end
end

::Gitlab::SafeRequestStore.ensure_request_store do
  times_to_run = 1000

  Benchmark.bm do |x|
    x.report('without cache') { times_to_run.times { run_without_cache } }
    x.report('with cache') { times_to_run.times { run_with_cache } }
  end
end; nil
               user       system     total        real
without cache  5.867844   0.011283   5.879127 (  5.897253)
with cache     0.010458   0.000036   0.010494 (  0.010554)
Edited by Furkan Ayhan

Merge request reports

Loading