Skip to content

Fix some SAST jobs running unexpectedly in projects with many files

Manuel Grabowski requested to merge mg-fix-sast-ci-exist-rules-20230825 into master

What does this MR do and why?

https://docs.gitlab.com/ee/ci/yaml/#rulesexists has an important limitation:

For performance reasons, GitLab performs a maximum of 10,000 checks against exists patterns or file paths. After the 10,000th check, rules with patterned globs always match. In other words, the exists rule always assumes a match in projects with more than 10,000 files, or if there are fewer than 10,000 files but the exists rules are checked more than 10,000 times.

This maximum is "shared" among the globs in a single rules:exists: block because every file has to be compared against each glob individually. That means that with n different globs, the job will always be included in the pipeline for any project with more then 10000 / n files.

You can see this in effect here: https://gitlab.com/gl-demo-ultimate-mgrabowski/sast-rules-test – 850 files and nothing but the standard SAST include. The resulting pipeline contains a SAST job, despite no actually matching files being present in the project. The reason is the 12 different globs in the job definition: 10000 / 12 = 833,3. That is not a lot of files.

There is a workaround though: The limit is not per rules: block, but per "rule". That means by splitting up the globs into individual rules, you can still use the full limit.

Compare these two examples that illustrate this:

  • This config with "split globs" leads to a pipeline that does not include the job that has no matching files in the repository
  • This config with all three globs in one rule leads to a pipeline that does include the job that has no matching files in the repository

(This repo has 3400 *.py files in it, because 10000 / 3 globs = 3333 files maximum before running into the problem.)


This MR is mainly to draw attention too the issue (which we discovered in a real world case in 🎫 #440360 (internal)). The only other affected template is Dependency-Scanning.latest.gitlab-ci.yml, but due to the use of reference the fix isn't quite as straight forward there.

The fix for this template should also be made cleaner in terms of reformatting the comments etc., hence a draft MR to first discuss if this fix is desired at all or if we rather not change this and instead document the limitation. I've also updated the comment structure to account for the fact that most jobs would now have multiple rules: lines.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Manuel Grabowski

Merge request reports