Skip to content
Snippets Groups Projects

Improve CI rules:exists pattern matching performance with cache

Merged Furkan Ayhan requested to merge 351593-cache-rules-exists into master

What does this MR do and why?

In rules:exists, if a file path is a pattern (including *), the path pattern is iterated through every single file in the repository via File.fnmatch?. This is an expensive operation so it is limited to MAX_PATTERN_COMPARISONS (10k for now) per pattern.

We want to increase this limit (#227632 (closed)) but we need to improve (#351593 (closed)) the performance problem first.

Here, we are adding a request-store caching per glob. With this, a glob is iterated only once. We can accept the fact that files do not change during a request or a worker processing.

This is behind a feature flag ci_rules_exists_pattern_matches_cache (#457914 (closed)).

Related to #351593 (closed)

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

The project has these files;

  • config/app.rb
  • hello1.yml
  • hello2.yml

and this CI config;

job1:
  script: echo Hello, World!
  rules:
    - exists:
      - 'docs/*.md' # does not match
      - 'config/*.rb' # matches

job2:
  script: echo Hello, World!
  rules:
    - exists:
      - 'docs/*.md' # does not match
      - '**/app.rb' # matches

job3:
  script: echo Hello, World!
  rules:
    - exists:
      - 'config/*.yml' # does not match
      - '**/app.rb' # matches

job4:
  script: echo Hello, World!
  rules:
    - exists:
      - '**/app.rb' # matches

When the feature flag is enabled:

  • File.fnmatch? runs for docs/*.md 3 times (the number of project files) because it iterates all files once.
  • File.fnmatch? runs for config/*.rb once because it iterates once and finds the file.
  • File.fnmatch? runs for config/*.yml 3 times (the number of project files) because it iterates all files once.
  • File.fnmatch? runs for **/app.rb once because it iterates once and finds the file.

When the feature flag is not enabled:

  • File.fnmatch? runs for docs/*.md 6 times (2 x the number of project files) because it iterates all files twice.
  • File.fnmatch? runs for config/*.rb once because it iterates once and finds the file.
  • File.fnmatch? runs for config/*.yml 3 times (the number of project files) because it iterates all files once.
  • File.fnmatch? runs for **/app.rb 3 times because it iterates once and finds the file three times.

Benchmark

# GITALY_DISABLE_REQUEST_LIMITS=true rails console

require 'benchmark'

ActiveRecord::Base.logger = nil

@project = Project.find(4) # test project with ~16k files
@paths = @project.repository.ls_files('master')
@glob = 'tests/test-*.c' # matches almost at the end of the list

def run_without_cache
  @paths.any? do |path|
    File.fnmatch?(@glob, path, File::FNM_PATHNAME | File::FNM_DOTMATCH | File::FNM_EXTGLOB)
  end
end

def run_with_cache
  Gitlab::SafeRequestStore.fetch("test123123_#{@glob}") do
    @paths.any? do |path|
      File.fnmatch?(@glob, path, File::FNM_PATHNAME | File::FNM_DOTMATCH | File::FNM_EXTGLOB)
    end
  end
end

::Gitlab::SafeRequestStore.ensure_request_store do
  times_to_run = 1000

  Benchmark.bm do |x|
    x.report('without cache') { times_to_run.times { run_without_cache } }
    x.report('with cache') { times_to_run.times { run_with_cache } }
  end
end; nil
               user       system     total        real
without cache  5.867844   0.011283   5.879127 (  5.897253)
with cache     0.010458   0.000036   0.010494 (  0.010554)
Edited by Furkan Ayhan

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • 1 Message
    :book: CHANGELOG missing:

    If this merge request needs a changelog entry, add the Changelog trailer to the commit message you want to add to the changelog.

    If this merge request doesn't need a CHANGELOG entry, feel free to ignore this message.

    Reviewer roulette

    Category Reviewer Maintainer
    backend @syasonik profile link current availability (UTC-4, 6 hours behind author) @mhamda profile link current availability (UTC+2, same timezone as author)
    ~"Verify" Reviewer review is optional for ~"Verify" @pedropombeiro profile link current availability (UTC+2, same timezone as author)

    Please check reviewer's status!

    • available Reviewer is available!
    • unavailable Reviewer is unavailable!

    Please refer to documentation page for guidance on how you can benefit from the Reviewer Roulette, or use the GitLab Review Workload Dashboard to find other available reviewers.

    If needed, you can retry the :repeat: danger-review job that generated this comment.

    Generated by :no_entry_sign: Danger

  • Furkan Ayhan changed the description

    changed the description

  • Furkan Ayhan added 1 commit

    added 1 commit

    • ba01515a - Improve CI rules:exists pattern matching performance with cache

    Compare with previous version

  • Furkan Ayhan requested review from @drew

    requested review from @drew

  • Furkan Ayhan resolved all threads

    resolved all threads

  • mentioned in issue #351593 (closed)

  • drew stachon
  • drew stachon
  • drew stachon
  • drew stachon removed review request for @drew

    removed review request for @drew

  • Furkan Ayhan requested review from @rshambhuni

    requested review from @rshambhuni

  • Furkan Ayhan added 1 commit

    added 1 commit

    • 2433a827 - Change FF actor and store key to project instead of pipeline

    Compare with previous version

  • Furkan Ayhan requested review from @drew

    requested review from @drew

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading