Regexp support for rules:changes/exists

Add regexp: support to rules:changes and rules:exists

Closes #198688

What does this MR do?

Adds a regexp: key to rules:changes and rules:exists, allowing CI jobs to match changed or existing files using Ruby regular expressions instead of glob patterns.

regexp: and paths: are mutually exclusive - exactly one must be present. Using both or neither is a validation error.

Behaviour

When regexp: is provided, at least one file path must match the pattern for the rule to be satisfied.

Performance guards - matching over large change sets is bounded in two ways. Both are logged; the comparison limit fails open (returns true), while the time budget raises a configuration error:

  1. Comparison limit - mirroring the existing CHANGES_MAX_PATTERN_COMPARISONS (50,000) limit for glob matching, paths.size is checked against this limit before the pattern is evaluated, preventing unbounded iteration over very large change sets.
  2. Total time budget - the whole match loop is bounded by REGEXP_TOTAL_TIMEOUT_SECONDS (2s). Elapsed time is tracked across paths and, once the budget is exceeded, the rule raises a ParseError that fails the pipeline with a configuration error. This is the guard that bounds pipeline-creation time, since the per-match timeout alone only limits one path at a time.

CI/CD variables

Variables in regexp: are expanded before matching, the same as rules:changes:paths and rules:exists:paths. For example, regexp: '\A(?!$DOCS_DIR/)' resolves $DOCS_DIR from the pipeline variables before the pattern runs.

Feature flag

The feature is gated behind the ci_rules_regexp flag (gitlab_com_derisk type, disabled by default).

Fail-open behavior - when the flag is disabled, a job configured with regexp: evaluates to true (the job always runs) rather than falling through to broken glob logic. This makes the flag a pure rollout gate: users can write regexp: configs immediately and they will behave correctly once the flag is enabled, with no config changes required. This is consistent with the comparison-limit guard, which also returns true. The time budget and per-match timeout instead raise a configuration error, since a regexp that cannot finish in time is treated as a misconfiguration.

Security assumptions

User-provided regular expressions are a potential ReDoS vector. The following mitigations are applied in layers:

  1. Length limit - patterns longer than 255 characters are rejected at config validation time. Because CI/CD variables are expanded at runtime, the same 255-character cap is re-checked on the expanded pattern before it is compiled. An over-limit expanded pattern raises a pipeline error (rules:...:regexp is too long) so the user gets clear feedback, since a pattern that expands beyond the cap is a configuration error rather than a transient runtime condition. This means variable expansion cannot smuggle in an arbitrarily large pattern.
  2. Compile-time validation - Regexp.new(regexp) is called during YAML parsing to reject syntactically invalid patterns early, before any pipeline runs.
  3. Per-match timeout - at runtime, patterns are compiled with Regexp.new(pattern, timeout: REGEXP_TIMEOUT_SECONDS) (50ms; a Ruby 3.2+ feature). A single match that exceeds the timeout raises Regexp::TimeoutError, which is caught and re-raised as a ParseError so the pipeline fails with a clear configuration error instead of blocking a worker thread. This per-match timeout is the only thing that can interrupt a single in-progress match, so it works alongside the total time budget that bounds the loop as a whole.
  4. Full Ruby regexp engine - unlike rules:if which uses RE2 for user patterns, regexp: here uses Ruby's native engine with the timeout guard. This enables lookaheads and lookbehinds (the primary use case - e.g. \A(?!docs/) to match files outside a directory) while remaining safe under the timeout constraint.

References

Edited by Oleg Yakovenko

Merge request reports

Loading