Regexp support for rules:changes/exists
Add regexp: support to rules:changes and rules:exists
Closes #198688
What does this MR do?
Adds a regexp: key to rules:changes and rules:exists, allowing CI jobs to match changed or existing files using Ruby regular expressions instead of glob patterns.
regexp: and paths: are mutually exclusive - exactly one must be present. Using both or neither is a validation error.
Behaviour
When regexp: is provided, at least one file path must match the pattern for the rule to be satisfied.
Performance guards - matching over large change sets is bounded in two ways. Both are logged; the comparison limit fails open (returns true), while the time budget raises a configuration error:
- Comparison limit - mirroring the existing
CHANGES_MAX_PATTERN_COMPARISONS(50,000) limit for glob matching,paths.sizeis checked against this limit before the pattern is evaluated, preventing unbounded iteration over very large change sets. - Total time budget - the whole match loop is bounded by
REGEXP_TOTAL_TIMEOUT_SECONDS(2s). Elapsed time is tracked across paths and, once the budget is exceeded, the rule raises aParseErrorthat fails the pipeline with a configuration error. This is the guard that bounds pipeline-creation time, since the per-match timeout alone only limits one path at a time.
CI/CD variables
Variables in regexp: are expanded before matching, the same as rules:changes:paths and rules:exists:paths. For example, regexp: '\A(?!$DOCS_DIR/)' resolves $DOCS_DIR from the pipeline variables before the pattern runs.
Feature flag
The feature is gated behind the ci_rules_regexp flag (gitlab_com_derisk type, disabled by default).
Fail-open behavior - when the flag is disabled, a job configured with regexp: evaluates to true (the job always runs) rather than falling through to broken glob logic. This makes the flag a pure rollout gate: users can write regexp: configs immediately and they will behave correctly once the flag is enabled, with no config changes required. This is consistent with the comparison-limit guard, which also returns true. The time budget and per-match timeout instead raise a configuration error, since a regexp that cannot finish in time is treated as a misconfiguration.
Security assumptions
User-provided regular expressions are a potential ReDoS vector. The following mitigations are applied in layers:
- Length limit - patterns longer than 255 characters are rejected at config validation time. Because CI/CD variables are expanded at runtime, the same 255-character cap is re-checked on the expanded pattern before it is compiled. An over-limit expanded pattern raises a pipeline error (
rules:...:regexp is too long) so the user gets clear feedback, since a pattern that expands beyond the cap is a configuration error rather than a transient runtime condition. This means variable expansion cannot smuggle in an arbitrarily large pattern. - Compile-time validation -
Regexp.new(regexp)is called during YAML parsing to reject syntactically invalid patterns early, before any pipeline runs. - Per-match timeout - at runtime, patterns are compiled with
Regexp.new(pattern, timeout: REGEXP_TIMEOUT_SECONDS)(50ms; a Ruby 3.2+ feature). A single match that exceeds the timeout raisesRegexp::TimeoutError, which is caught and re-raised as aParseErrorso the pipeline fails with a clear configuration error instead of blocking a worker thread. This per-match timeout is the only thing that can interrupt a single in-progress match, so it works alongside the total time budget that bounds the loop as a whole. - Full Ruby regexp engine - unlike
rules:ifwhich uses RE2 for user patterns,regexp:here uses Ruby's native engine with the timeout guard. This enables lookaheads and lookbehinds (the primary use case - e.g.\A(?!docs/)to match files outside a directory) while remaining safe under the timeout constraint.