Regex variables do not handle interpolated values correctly

As of #35438 (closed), it is possible to store a regex pattern in a variable and then us it in an 'if' condition in a rules or only clause. However, if the variable is interpolated, i.e. contains a reference to another variable, this isn't handled correctly; the interpolated part is treated as if the dollar-sign and variable name were still there.

Test 1

Reproduction:

gitlab-ci.yml:

variables:
  ABC: "aabc"
  B: "b"
  REGEX_LITERAL: '/^a+bc$/'
  REGEX_INTERPOLATED: '/^a+${B}c$/'


stages:
  - 'test'


########
# Jobs #
########

.test-job:
  stage: 'test'
  tags:
    - 'build'
  script:
    - "exit 0"

test-job-inline_regex_literal:
  extends: '.test-job'
  rules:
    - if: '$ABC =~ /^a+bc$/'
      when: 'manual'
    - when: 'never'

test-job-inline_regex_interpolated:
  extends: '.test-job'
  rules:
    - if: '$ABC =~ /^a+${B}c$/'
      when: 'manual'
    - when: 'never'

test-job-var_regex_literal:
  extends: '.test-job'
  rules:
    - if: '$ABC =~ $REGEX_LITERAL'
      when: 'manual'
    - when: 'never'

test-job-var_regex_interpolated:
  extends: '.test-job'
  rules:
    - if: '$ABC =~ $REGEX_INTERPOLATED'
      when: 'manual'
    - when: 'never'


echo-vars:
  stage: 'test'
  rules:
    - when: 'always'
  tags:
    - 'build'
  script:
    - |
      for var in 'ABC' 'B' 'REGEX_LITERAL' 'REGEX_INTERPOLATED' ; do
        python3 -c 'import sys, os; var_name = sys.argv[1]; var_val = os.getenv(var_name); print(f"${var_name}: {var_val!r}")' "${var}"
      done

Expected behaviour:

  • Jobs test-job-inline_regex_literal, test-job-var_regex_literal, and test-job-var_regex_interpolated run
    • The documentation clearly states that variables may be defined in terms of other variables, and this has been born out in the past by their behaviour in non-regex contexts.
  • Job test-job-inline_regex_interpolated may or may not run
    • There is an existing issue, #209904, about supporting variable references within regex patterns.

Actual behaviour:

  • Jobs test-job-inline_regex_literal and test-job-var_regex_interpolated run
  • Jobs test-job-var_regex_literal and test-job-inline_regex_interpolated do not run
  • The output of echo-vars confirms that $REGEX_LITERAL and $REGEX_INTERPOLATED have identical values:
    [...]
    Executing "step_script" stage of the job script
    00:00
    $ for var in 'ABC' 'B' 'REGEX_LITERAL' 'REGEX_INTERPOLATED' ; do # collapsed multi-line command
    $ABC: 'aabc'
    $B: 'b'
    $REGEX_LITERAL: '/^a+bc$/'
    $REGEX_INTERPOLATED: '/^a+bc$/'
    Cleaning up file based variables
    Job succeeded

Test 2

Reproduction:

gitlab-ci.yml:

variables:
  AB: "aab"
  A_NEWLINE: "aa\n"
  '1': "b"
  REGEX_INTERPOLATED: '/(?m)^a+${1}/'


stages:
  - 'test'


########
# Jobs #
########

.test-job:
  stage: 'test'
  tags:
    - 'build'
  script:
    - "exit 0"

test-job-ab:
  extends: '.test-job'
  rules:
    - if: '$AB =~ $REGEX_INTERPOLATED'
      when: 'manual'
    - when: 'never'

test-job-a_newline:
  extends: '.test-job'
  rules:
    - if: '$A_NEWLINE =~ $REGEX_INTERPOLATED'
      when: 'manual'
    - when: 'never'

Expected behaviour:

  • Job test-job-ab runs
  • Job test-job-a_newline does not run

Actual behaviour:

  • Job test-job-ab does not run
  • Job test-job-a_newline runs
    • This proves that the regex is behaving as if the dollar-sign and variable name were still there, rather than, e.g., being interpolated to a blank value, or crashing the regex parser
  • I was unable to use an echo-vars job to print the variable values, because '1' is not a legal variable name in Bash, so the job would crash without printing anything

Notes:

There may be a concern that enabling variable interpolation in will break existing code by causing substitutions to be performed where not intended because they coincidentally contain a dollar sign followed by the name of a variable. On that subject, I have two suggestions:

  • Within regexes, require that variable references use ${varname} rather than $varname. Unless the name of the variable is an integer, the curly-brackets aren't a valid regex construct.
    • However, this isn't a complete solution: [${var}] could either be a character class containing a variable substitution or a character class matching 'v', 'a', 'r', dollar-sign, and both curly-brackets
  • Interpolation in regex literals might break existing code, but regex variables have only been around for a week, so there should be very little existing code to break

Rationale:

The real-world use-case for which variable interpolation would be useful:

  • My team has code spread across multiple Gitlab repos. All of these repos use CI/CD pipelines derived from a single shared template.
  • These pipelines support various tags which can be added to a Git commit message to control pipeline behaviour - force a pipeline for a non-master branch, control which deployment steps run automatically or manually, ignore errors in the unit-test step, etc.
  • These tags include the project name, to ensure a tag intended for one repo doesn't affect another. However, the project name is the only part of these regexes which is different between repos, and it is the same for all tag regex in a given repo. It would therefore be very useful if we could define the project name once, as a global variable, and have it be inserted in all the places it is needed.
    • Indeed, the original conception of the template was that all per-repo configuration would be confined to setting values in the global variables section of gitlab-ci.yml, and that the rest of the file would be identical in every repo, in order to make it as simple as possible to keep them all in sync as the template evolved. Until recently, this was impossible because you couldn't abstract regexes to variables at all; when that changed, I was most disappointed to find I still couldn't parameterize the regexes properly.