Improve step_script to bring it on part with Runner legacy path

What does this MR do?

This MR fixes the assertion failures with runner-backbench

[2026-03-31 18:05:20] [assert] ✗ multistep/failure_on_script_step,_release_is_skipped,_after_script_runs: 1 assert error(s)
[2026-03-31 18:05:20] [assert]     - assertion failed: job.trace contains 'exit code 1'
[2026-03-31 18:05:20] [assert] ✗ passing_envs/environment_variable_from_script_to_after_script: 3 assert error(s)
[2026-03-31 18:05:20] [assert]     - assertion failed: job.trace contains 'hellovalue=world'
[2026-03-31 18:05:20] [assert]     - assertion failed: job.trace contains 'foovalue=bar'
[2026-03-31 18:05:20] [assert]     - assertion failed: job.trace contains 'unknown/path/bar: no matching files'
[2026-03-31 18:05:20] [assert] ✗ timeout/job_timeout_during_script: 1 assert error(s)
[2026-03-31 18:05:20] [assert]     - assertion failed: job.trace contains 'step_script could not run to completion'
[2026-03-31 18:05:20] [assert] ✗ multistep/failure_on_release_step,_after_script_runs: 1 assert error(s)
[2026-03-31 18:05:20] [assert]     - assertion failed: job.trace contains 'exit code 1'
[2026-03-31 18:05:20] [assert] ✗ images/ownership_overflow: 1 assert error(s)
[2026-03-31 18:05:20] [assert]     - assertion failed: job.exit_code == 137
[2026-03-31 18:05:20] [assert] ✗ timeout/script_timeout: 1 assert error(s)
[2026-03-31 18:05:20] [assert]     - assertion failed: job.trace contains 'step_script could not run to completion'

With Claude help, the root cause of each failure was identified and fixed. Only images/ownership_overflow remains!

multistep/failure_on_script_step,_release_is_skipped,_after_script_runs and multistep/failure_on_release_step,_after_script_runs

Assertion: job.trace contains 'exit code 1'

Commit 93f012c96 added normalization in wrapStepStageErr (common/build.go:535-542) that converts "exit status N" → "exit code N". The fix is correct in principle — the "Job failed: ..." message written via buildLogger.SoftErrorln at build.go:1291 would contain "exit code 1" after normalization.

If still failing, the root cause is that status.Message from step-runner doesn't contain "exit status" at all for one of these test configurations. The strings.TrimSpace(code) → strconv.Atoi chain also silently aborts if the code part has trailing non-numeric characters. These tests were pre-existing failures on main that this branch attempts to fix.

passing_envs/environment_variable_from_script_to_after_script

Assertions: hellovalue=world, foovalue=bar, unknown/path/bar: no matching files

The concrete runner's loadGitlabEnv (runner.go:388-413) re-reads the GITLAB_ENV file before each step, rebuilding the overlay. This requires setupGitlabEnv to have created the file first (runner.go:368-382).

The env file is created at {WorkingDir}.tmp/gitlab_runner_env. If WorkingDir is the project directory inside the container (e.g., /builds/project), then .tmp is a sibling directory that must be writable. If the ownership_overflow-style permissions prevent creation of the .tmp dir, setupGitlabEnv fails silently or loadGitlabEnv reads nothing. However, more likely: the jobs use the native steps: format (len(b.Job.Run) > 0), which routes through executeStepStage → steps.Execute → step-runner directly, bypassing the concrete runner entirely. In that path, loadGitlabEnv is never called and GITLAB_ENV is not reloaded between steps. Variables written by script are not available in custom_step or after_script.

timeout/job_timeout_during_script

Assertion: job.trace contains 'step_script could not run to completion'

The warning at build.go:729-732 fires when errors.Is(err, context.DeadlineExceeded). When the job context expires, c.RunAndFollow (gRPC) returns a gRPC status error (codes.DeadlineExceeded), not a Go
context.DeadlineExceeded. gRPC status errors don't unwrap to context.DeadlineExceeded, so errors.Is returns false → warning is never logged.

timeout/script_timeout

Assertion: job.trace contains 'step_script could not run to completion'

Commit c340cf2a4 added r.logWarningf(...) at runner.go:224 inside the concrete runner, firing when errors.Is(err, ErrJobScriptTimeout). This is correct for the concrete runner path (UseConcrete=true, 
len(b.Job.Run)==0).

However, the RUNNER_SCRIPT_TIMEOUT variable is only parsed by buildScriptTimeout() in functions/concrete/builder/builder.go:381. If the test runs through the native steps path (len(b.Job.Run) > 0 with FF_SCRIPT_TO_STEP_MIGRATION=true), the concrete runner is never invoked, RUNNER_SCRIPT_TIMEOUT is just a CI variable with no effect, sleep 20 runs to completion normally, and no timeout warning is ever emitted.

Why was this MR needed?

Bring step_script to part with Runner Legacy behaviour

What's the best way to test this MR?

Only is still images/ownership_overflow failing for now on both glci and inside the Pilot Runner pipeline

[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✗ images/ownership_overflow: 1 assert error(s)
[runner-step-script-test] [2026-04-01 21:57:27] [assert]     - assertion failed: job.exit_code == 137
[2026-04-02 00:35:32] [assert] ✗ images/ownership_overflow: 1 assert error(s)
[2026-04-02 00:35:32] [assert]     - assertion failed: job.exit_code == 137

Local test with glci

> glci version
glci  commit 5d9a2771d16f525d2db8953dc54674f778cb78c4
daemon commit 5d9a2771d16f525d2db8953dc54674f778cb78c4 (pid 44652, up 2h33m40s)

runner-step-script-test log

Job Log
── stage: test ──
  ● runner-step-script-test
[runner-step-script-test] Running with gitlab-runner 18.10.0 (ac71f4d8)
[runner-step-script-test]   on glci-local-runner bd74f54e6, system ID: s_b188029b2abb
[runner-step-script-test] ── Preparing the "docker" executor ──
[runner-step-script-test] Using Docker executor with image docker:24.0.9 ...
[runner-step-script-test] Starting service docker:24.0.9-dind...
[runner-step-script-test] Using effective pull policy of [if-not-present] for container docker:24.0.9-dind
[runner-step-script-test] Using locally found image version due to "if-not-present" pull policy
[runner-step-script-test] Using docker image sha256:9b17a9f25adf17b88d0a013b4f00160754adf4b07ccbe9986664a49886c2c98e for docker:24.0.9-dind with digest docker@sha256:9b17a9f25adf17b88d0a013b4f00160754adf4b07ccbe9986664a49886c2c98e ...
[runner-step-script-test] Waiting for services to be up and running (timeout 30 seconds)...
[runner-step-script-test] Using effective pull policy of [if-not-present] for container docker:24.0.9
[runner-step-script-test] Using locally found image version due to "if-not-present" pull policy
[runner-step-script-test] Using docker image sha256:9b17a9f25adf17b88d0a013b4f00160754adf4b07ccbe9986664a49886c2c98e for docker:24.0.9 with digest docker@sha256:9b17a9f25adf17b88d0a013b4f00160754adf4b07ccbe9986664a49886c2c98e ...
[runner-step-script-test] ── Preparing environment ──
[runner-step-script-test] Using effective pull policy of [if-not-present] for container sha256:9c7de97a81b69ce6f973da14a98f633eb7ffd0e79e9f94619301565e6eb083f5
[runner-step-script-test] Running on runner-bd74f54e6-project-1-concurrent-0 via ratchade--20240612-H2W0T...
[runner-step-script-test] ── Getting source from Git repository ──
[runner-step-script-test] Gitaly correlation ID: 7129df8ee96148a591f0f50cb62c20f8
[runner-step-script-test] Fetching changes...
[runner-step-script-test] Initialized empty Git repository in /builds/gitlab-org/ci-cd/runner-tools/pilot-runners/.git/
[runner-step-script-test] Created fresh repository.
[runner-step-script-test] Checking out 9bff6660 as detached HEAD (ref is step_365-runner-backbench-for-tests)...
[runner-step-script-test] Updating/initializing submodules recursively...
[runner-step-script-test] Submodule 'grit' (https://gitlab.com/gitlab-org/ci-cd/runner-tools/grit.git) registered for path 'grit'
[runner-step-script-test] Synchronizing submodule url for 'grit'
[runner-step-script-test] Cloning into '/builds/gitlab-org/ci-cd/runner-tools/pilot-runners/grit'...
[runner-step-script-test] Submodule path 'grit': checked out 'acc300582ddb5b181b10095b480951c80a7edb37'
[runner-step-script-test] Updated submodules
[runner-step-script-test] Synchronizing submodule url for 'grit'
[runner-step-script-test] Entering 'grit'
[runner-step-script-test] Configuring submodules to use parent git credentials...
[runner-step-script-test] Entering 'grit'
[runner-step-script-test] Pulling LFS files...
[runner-step-script-test] Entering 'grit'
[runner-step-script-test] ── Downloading artifacts ──
[runner-step-script-test] Downloading artifacts for build:image:runner-backbench: [amd64] (3)...
[runner-step-script-test] Downloading artifacts from coordinator... ok        correlation_id=535ba96b43f34840b06d0a77dc077904 host=host.docker.internal:57204 id=3 responseStatus=200 OK token=6008fb5fa
[runner-step-script-test] Downloading artifacts for build:image:runner-backbench: [arm64] (4)...
[runner-step-script-test] Downloading artifacts from coordinator... ok        correlation_id=95541c4daf6c4cb38059a03438c149b5 host=host.docker.internal:57204 id=4 responseStatus=200 OK token=de84ea512
[runner-step-script-test] Downloading artifacts for build:binaries: [amd64] (5)...
[runner-step-script-test] Downloading artifacts from coordinator... ok        correlation_id=2de4f10dbc7244b591b3952d35fdd612 host=host.docker.internal:57204 id=5 responseStatus=200 OK token=9eeb1f2e9
[runner-step-script-test] Downloading artifacts for build:binaries: [arm64] (6)...
[runner-step-script-test] Downloading artifacts from coordinator... ok        correlation_id=09a5bc01ffc04bc49b382e03c607f680 host=host.docker.internal:57204 id=6 responseStatus=200 OK token=b140d4baf
[runner-step-script-test] Downloading artifacts for build:helper-linux: [amd64, ] (7)...
[runner-step-script-test] Downloading artifacts from coordinator... ok        correlation_id=7a6f5b4802a742af9182f3316d11cefc host=host.docker.internal:57204 id=7 responseStatus=200 OK token=373b4ac61
[runner-step-script-test] Downloading artifacts for build:helper-linux: [arm64, -arm64] (2)...
[runner-step-script-test] Downloading artifacts from coordinator... ok        correlation_id=077e0f969bff4fbfad287cbb50869d11 host=host.docker.internal:57204 id=2 responseStatus=200 OK token=ec5b900ba
[runner-step-script-test] Downloading artifacts for build:runner-image: [amd64, , ] (9)...
[runner-step-script-test] Downloading artifacts from coordinator... ok        correlation_id=a2132d4119334ab4828c9ddeb8b8a722 host=host.docker.internal:57204 id=9 responseStatus=200 OK token=b4d2a5d85
[runner-step-script-test] Downloading artifacts for build:runner-image: [arm64, --platform linux/arm64, -arm64] (8)...
[runner-step-script-test] Downloading artifacts from coordinator... ok        correlation_id=98944abf766c425db0af6d6f86e6922c host=host.docker.internal:57204 id=8 responseStatus=200 OK token=2e9c4c44c
[runner-step-script-test] ── Executing "step_run" stage of the job script ──
[runner-step-script-test] Using effective pull policy of [if-not-present] for container docker:24.0.9
[runner-step-script-test] Using docker image sha256:9b17a9f25adf17b88d0a013b4f00160754adf4b07ccbe9986664a49886c2c98e for docker:24.0.9 with digest docker@sha256:9b17a9f25adf17b88d0a013b4f00160754adf4b07ccbe9986664a49886c2c98e ...
[runner-step-script-test] step-runner is ready.
[runner-step-script-test] Running step name=setup_environment
[runner-step-script-test] fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/main/aarch64/APKINDEX.tar.gz
[runner-step-script-test] fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/community/aarch64/APKINDEX.tar.gz
[runner-step-script-test] (1/3) Installing libacl (2.3.2-r0)
[runner-step-script-test] (2/3) Installing tar (1.35-r2)
[runner-step-script-test] (3/3) Installing wget (1.24.5-r0)
[runner-step-script-test] Executing busybox-1.36.1-r29.trigger
[runner-step-script-test] OK: 58 MiB in 76 packages
[runner-step-script-test] go: downloading github.com/magefile/mage v1.17.0
[runner-step-script-test] Loaded image: gitlab-runner-helper:local
[runner-step-script-test] Loaded image: runner-backbench:local
[runner-step-script-test] Loaded image: gitlab-runner:local
[runner-step-script-test] 434ed4b8ae03954aa4cc61ab89169877401c232e35c96c4829ef1ccc5383b633
[runner-step-script-test] Starting backbench container...
[runner-step-script-test] 422951f39d0d3a84385e7457397b2921e1200e7239a6c856522a84cfdf6d87cc
[runner-step-script-test] backbench started
[runner-step-script-test] Running step name=run_test
[runner-step-script-test] Starting gitlab-runner container...
[runner-step-script-test] f5393fa3d564a8d5f46730d50b0ef87ffcadfe32557affc32214d4a13edaa3bf
[runner-step-script-test] gitlab-runner started
[runner-step-script-test] [2026-04-01 21:52:16] [assert] Waiting for file matching '*-docker.json' in /builds/gitlab-org/ci-cd/runner-tools/pilot-runners/out/ (timeout: 10m)
[runner-step-script-test] [2026-04-01 21:57:26] [assert] ✓ Found result file: /builds/gitlab-org/ci-cd/runner-tools/pilot-runners/out/20260401-215724_gitlab-runner-docker.json
[runner-step-script-test] [2026-04-01 21:57:26] [assert] ✓ Result path written to /builds/gitlab-org/ci-cd/runner-tools/pilot-runners/out/result_path.txt
[runner-step-script-test] [2026-04-01 21:57:27] [assert] Checking result file: /builds/gitlab-org/ci-cd/runner-tools/pilot-runners/out/20260401-215724_gitlab-runner-docker.json
[runner-step-script-test] [2026-04-01 21:57:27] [assert] Runner version: 18.11.0~pre.884.g5ad535cc
[runner-step-script-test] [2026-04-01 21:57:27] [assert] Total test cases: 46
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ successful/successful_build_with_release_and_after_script_step: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ bash/script_error: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ build_setting/incorrect_CI_DEBUG_SERVICES: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ cancel/during_after_script_phase: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ exit_code/exit_code_99: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ images/tag_not_found: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ timeout/after_script_timeout_but_job_is_successful: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ build_setting/incorrect_GIT_STRATEGY: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ images/with_stderr_output: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ log_line_length/buffer_sized_log: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ bash/return_129: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ masking/masking: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ script_length/bash_nested_here-string: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ trace_length/exceed_trace_limit,_aborted_(bash): passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ bash/export_env_var: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ exit_code/exit_code_1: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ images/expanded_name: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✗ images/ownership_overflow: 1 assert error(s)
[runner-step-script-test] [2026-04-01 21:57:27] [assert]     - assertion failed: job.exit_code == 137
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ multistep/failure_on_script_step,_release_is_skipped,_after_script_runs: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ passing_envs/environment_variable_from_script_to_after_script: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ trace_length/exceed_trace_limit,_successful_(bash): passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ log_line_length/long_log: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ bash/create_artifacts: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ cancel/during_script_phase: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ images/expanded_name_missing_variable: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ multistep/successful_build_with_release_and_after_script_step: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ script_length/bash_normal_script: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ bash/job_succeeds: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ bash/return_128: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ ci_job_status/state_on_success: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ images/without_root: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ stress/stress: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ timeout/script_timeout: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ variables/raw_variable: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ images/host_cannot_be_resolved: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ invalid_command/unknown_command: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ multistep/failure_on_release_step,_after_script_runs: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ timeout/job_timeout_during_script: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ variables/file_variable: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ bash/command_error: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ build_setting/incorrect_GIT_SUBMODULE_DEPTH: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ ci_job_status/state_on_failure: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ images/not_found: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ images/repo_not_found: passed (state=failed)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ log_line_length/short_line: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✓ script_length/bash_huge_script: passed (state=success)
[runner-step-script-test] [2026-04-01 21:57:27] [assert] ✗ 1 assert error(s) found across 46 test case(s)
[runner-step-script-test] Error: 1 assert error(s) found
[runner-step-script-test] ── Uploading artifacts for failed job ──
[runner-step-script-test] Uploading artifacts...
[runner-step-script-test] out/*.json: found 1 matching artifact files and directories 
[runner-step-script-test] out/*.log: found 1 matching artifact files and directories 
[runner-step-script-test] Uploading artifacts as "archive" to coordinator... 201 Created  correlation_id=7f65a24fb3d24155ba549fc463976baa id=10 responseStatus=201 Created token=684722610
[runner-step-script-test] ── Cleaning up project directory and file based variables ──
[runner-step-script-test] ERROR: Job failed: step "run_test": exec: exit status 1

Step-script test job in Pilot Runners pipeline

Job: https://gitlab.com/gitlab-org/ci-cd/runner-tools/pilot-runners/-/jobs/13753040943

What are the relevant issue numbers?

relates to Create test harness for runner-backbench backwa... (step-runner#365 - closed)

Edited by Romuald Atchadé

Merge request reports

Loading