Document how exporting variables in scripts works

Problem to solve

Exporting variables in scripts and the way it works is not really documented which brings confusion as seen in #3088 (closed).

Creating this issue based off of Tomasz comments:

#3088 (comment 1340977026):

Majority of comments in this issue refers to the fact that we don't have clear documentation on how variables work and why export my_var=... in script is not available in after_script while some other variables (like CI_JOB_ID) are.

Proposal

Add a new section to CI/CD variables called Exported variable handling in runner that documents:

The current state: what works, what doesn't.
The design choice that makes some things impossible and reason why it was made.
Known workarounds.

so that users don't need to seek for the answer deep in an issue thread like here 🙂

#3088 (comment 1341058342)

I'll leave here some descriptions and examples that we can use to provide such documentation

An example of why this works in a way it works, without going deep into Runner's codebase, would be these two scripts:

script.sh

#!/usr/bin/env bash

set -eo pipefail

export CI_JOB_ID=1

echo "This is before script"

export MY_VARIABLE="variable"

echo "This is script"

test -z "${FAIL_THE_SCRIPT}"

echo "FAIL_THE_SCRIPT variable was empty"

echo "MY_VARIABLE's value is: ${MY_VARIABLE}"

after_script.sh

#!/usr/bin/env bash

set -eo pipefail

export CI_JOB_ID=1

echo "This is after_script"

echo "MY_VARIABLE's value is: ${MY_VARIABLE}"

After making these files executable, execution of either of these scripts will create a new shell context (bash in this case) and execute the script context in it. This is also what Runner does - every job step (and there are several of them) is executed in a fresh shell context so that we can manage what happens in them. For some executors (like Docker or Kubernetes) it would be even impossible to not separate shell contexts.

You can see that both scripts have set -eo pipefail - this is to detect early any failure that was not handled (read: command execution that doesn't return exit code 0 from the command pipeline). Job execution then stops at the first detected error, runner tries to capture the exit code and send it back to GitLab. Any failure happening during some of the steps (artifacts download/upload, updating Git sources, before_script connected with script) immediately cause job failure. The test -z "${FAIL_THE_SCRIPT}" command will allow to see how it works.

after_script doesn't - it's designed to be executed always, so that in case of script failures users still have possibility to run some cleanup tasks.

In script.sh you can see that it firsts prints This is before script and then it prints This is script. And MY_VARIABLE definition looks like done in the before script. This would be a big simplification of a script that runner would compose from that definition:

before_script:
- echo "This is before script"
- export MY_VARIABLE="variable"
script:
- echo "This is script"
- test -z "${FAIL_THE_SCRIPT}"
- echo "FAIL_THE_SCRIPT variable was empty"
- echo "MY_VARIABLE's value is: ${MY_VARIABLE}"

So content of both before_script and script is concatenated and not executed separately! before_script was added as a .gitlab-ci.yml syntax feature to simplify job definitions. There are often things that in different jobs need to be done the same at the beginning. before_script is a shortcut to handle that case without a need to repeat yourself multiple times in every job or to hack around YAML anchors.

after_script is a totally separate script in a totally separate shell context. In fact, we may want to consider changing the naming (which would be a breaking change, so not a fast one; but doable) because I see how before_script and after_script can be confusing. Both named in a similar way but both working totally differently. This is however a wider change, going outside of just Runner group responsibility.

Getting back to my example:

We have two separate scripts representing two separate job steps that runner would execute separately.
We have support for automatic failure detection.
We have a MY_VARIABLE variable that is defined within script.sh.

Now, why this variable is not available in after_script.sh? Because what Runner does is more or less this:

$ ./script.sh; ec=$?; ./after_script.sh; echo "job exit code was ${ec}"
This is before script
This is script
FAIL_THE_SCRIPT variable was empty
MY_VARIABLE's value is: variable
This is after_script
MY_VARIABLE's value is:
job exit code was 0

.script.sh is executed - this creates a totally new shell process. It prints to the output, defines a variable, prints to the output again, simulates a condition that may possibly fail, prints to the output again **and prints the value of defined variable to the output.

Then after_script.sh is executed - this creates yet another new shell process, a process that doesn't share a context with script.sh execution at all. It prints to the output and then prints the value of MY_VARIABLE to the output. And that variable is empty, because after_script.sh have no information about it existence.

You can also see that script.sh execution detected no failures so the final job exit code is 0 (read: job succeeds).

How that would behave if the failing condition would be met (real live example: a unit test suite execution finds a failing test and exits with a non-zero exit code):

$ FAIL_THE_SCRIPT=1 ./script.sh; ec=$?; ./after_script.sh; echo "job exit code was ${ec}"
This is before script
This is script
This is after_script
MY_VARIABLE's value is:
job exit code was 1

script.sh is started and starts printing things until test -z "${FAIL_THE_SCRIPT}" exits with a non-zero exit code. Because of set -eo pipefail that immediately interrupts script execution (automatically on Unix platforms, we need to "simulate" that behavior on Windows). At the end of the output we can even see that it was properly detected. But after_script.sh is executed anyway, as it's a place to do any cleanup that could be required. Moreover, failure on that step would not influence final job state at all.

The last question is why in that case - where the scripts are executed in separate shell contexts that don't share exports, aliases, local function definitions nor any other local shell updates - the variable like CI_JOB_ID is available in every step?

Because Runner knows it even before job is started. It's called a predefined variable not without a reason. Many variables that are defined outside of job execution itself are known for GitLab or at least for Runner, so they can be included in the list of exports that is done at the beginning of every step script.

But a variable created as part of user's arbitrary script that is defined in the job - this is something that Runner can't detect. Everything inside script or before_script or after_script is taken "as is" and managing that is fully in users hands. Runner just adds a number of things around that to be able to detect failures and to orchestrate job execution.

Edited Jun 12, 2023 by Fiona Neill