Regex variables do not handle interpolated values correctly
As of #35438 (closed), it is possible to store a regex pattern in a variable and then us it in an 'if' condition in a rules or only clause. However, if the variable is interpolated, i.e. contains a reference to another variable, this isn't handled correctly; the interpolated part is treated as if the dollar-sign and variable name were still there.
Test 1
Reproduction:
gitlab-ci.yml:
variables:
ABC: "aabc"
B: "b"
REGEX_LITERAL: '/^a+bc$/'
REGEX_INTERPOLATED: '/^a+${B}c$/'
stages:
- 'test'
########
# Jobs #
########
.test-job:
stage: 'test'
tags:
- 'build'
script:
- "exit 0"
test-job-inline_regex_literal:
extends: '.test-job'
rules:
- if: '$ABC =~ /^a+bc$/'
when: 'manual'
- when: 'never'
test-job-inline_regex_interpolated:
extends: '.test-job'
rules:
- if: '$ABC =~ /^a+${B}c$/'
when: 'manual'
- when: 'never'
test-job-var_regex_literal:
extends: '.test-job'
rules:
- if: '$ABC =~ $REGEX_LITERAL'
when: 'manual'
- when: 'never'
test-job-var_regex_interpolated:
extends: '.test-job'
rules:
- if: '$ABC =~ $REGEX_INTERPOLATED'
when: 'manual'
- when: 'never'
echo-vars:
stage: 'test'
rules:
- when: 'always'
tags:
- 'build'
script:
- |
for var in 'ABC' 'B' 'REGEX_LITERAL' 'REGEX_INTERPOLATED' ; do
python3 -c 'import sys, os; var_name = sys.argv[1]; var_val = os.getenv(var_name); print(f"${var_name}: {var_val!r}")' "${var}"
done
Expected behaviour:
- Jobs
test-job-inline_regex_literal,test-job-var_regex_literal, andtest-job-var_regex_interpolatedrun- The documentation clearly states that variables may be defined in terms of other variables, and this has been born out in the past by their behaviour in non-regex contexts.
- Job
test-job-inline_regex_interpolatedmay or may not run- There is an existing issue, #209904, about supporting variable references within regex patterns.
Actual behaviour:
- Jobs
test-job-inline_regex_literalandtest-job-var_regex_interpolatedrun - Jobs
test-job-var_regex_literalandtest-job-inline_regex_interpolateddo not run - The output of
echo-varsconfirms that$REGEX_LITERALand$REGEX_INTERPOLATEDhave identical values:[...] Executing "step_script" stage of the job script 00:00 $ for var in 'ABC' 'B' 'REGEX_LITERAL' 'REGEX_INTERPOLATED' ; do # collapsed multi-line command $ABC: 'aabc' $B: 'b' $REGEX_LITERAL: '/^a+bc$/' $REGEX_INTERPOLATED: '/^a+bc$/' Cleaning up file based variables Job succeeded
Test 2
Reproduction:
gitlab-ci.yml:
variables:
AB: "aab"
A_NEWLINE: "aa\n"
'1': "b"
REGEX_INTERPOLATED: '/(?m)^a+${1}/'
stages:
- 'test'
########
# Jobs #
########
.test-job:
stage: 'test'
tags:
- 'build'
script:
- "exit 0"
test-job-ab:
extends: '.test-job'
rules:
- if: '$AB =~ $REGEX_INTERPOLATED'
when: 'manual'
- when: 'never'
test-job-a_newline:
extends: '.test-job'
rules:
- if: '$A_NEWLINE =~ $REGEX_INTERPOLATED'
when: 'manual'
- when: 'never'
Expected behaviour:
- Job
test-job-abruns - Job
test-job-a_newlinedoes not run
Actual behaviour:
- Job
test-job-abdoes not run - Job
test-job-a_newlineruns- This proves that the regex is behaving as if the dollar-sign and variable name were still there, rather than, e.g., being interpolated to a blank value, or crashing the regex parser
- I was unable to use an
echo-varsjob to print the variable values, because '1' is not a legal variable name in Bash, so the job would crash without printing anything
Notes:
There may be a concern that enabling variable interpolation in will break existing code by causing substitutions to be performed where not intended because they coincidentally contain a dollar sign followed by the name of a variable. On that subject, I have two suggestions:
- Within regexes, require that variable references use
${varname}rather than$varname. Unless the name of the variable is an integer, the curly-brackets aren't a valid regex construct.- However, this isn't a complete solution:
[${var}]could either be a character class containing a variable substitution or a character class matching 'v', 'a', 'r', dollar-sign, and both curly-brackets
- However, this isn't a complete solution:
- Interpolation in regex literals might break existing code, but regex variables have only been around for a week, so there should be very little existing code to break
Rationale:
The real-world use-case for which variable interpolation would be useful:
- My team has code spread across multiple Gitlab repos. All of these repos use CI/CD pipelines derived from a single shared template.
- These pipelines support various tags which can be added to a Git commit message to control pipeline behaviour - force a pipeline for a non-master branch, control which deployment steps run automatically or manually, ignore errors in the unit-test step, etc.
- These tags include the project name, to ensure a tag intended for one repo doesn't affect another. However, the project name is the only part of these regexes which is different between repos, and it is the same for all tag regex in a given repo. It would therefore be very useful if we could define the project name once, as a global variable, and have it be inserted in all the places it is needed.
- Indeed, the original conception of the template was that all per-repo configuration would be confined to setting values in the global variables section of
gitlab-ci.yml, and that the rest of the file would be identical in every repo, in order to make it as simple as possible to keep them all in sync as the template evolved. Until recently, this was impossible because you couldn't abstract regexes to variables at all; when that changed, I was most disappointed to find I still couldn't parameterize the regexes properly.
- Indeed, the original conception of the template was that all per-repo configuration would be confined to setting values in the global variables section of