Automatically canceled job continues execution and fails
Summary
When a job gets canceled mid run due to some other job failing and workflow.auto_cancel.on_job_failure being set to all, the script isn't always canceled cleanly. The currently running command in script fails and after_script is executed as if the job failed, even though UI correctly says that the job was canceled.
Steps to reproduce
In our case a lint and sempgrep jobs are running concurrently. If lint fails it cancels semgrep job while it is running but the script continues execution, trying to grep the output file that never gets generated.
The minimal config here isn't 1:1 with the problem because I couldn't manage to reproduce it but it still demonstrates how the job doesn't cancel properly but continues execution.
.gitlab-ci.yml
workflow:
auto_cancel:
on_job_failure: all
stages:
- lint
lint:
stage: lint
image: registry.gitlab.com/pipeline-components/ruff:0.18.0
script:
- sleep 5
- exit 1
interruptible: true
semgrep-mr:
stage: lint
image: returntocorp/semgrep
script:
# this is just an attempt at simulating the problem. the actual script is something akin to
# - |
# if ! semgrep ci --verbose --no-suppress-errors --output semgrep.xml --junit-xml; then
# grep "failure type" semgrep.xml
# exit 1
# fi
- set -euo pipefail
- |
if ! sleep 50; then
echo "this should not print"
exit 1
fi
- echo "this should only print if the job succeeds, which it shouldn't"
after_script:
- |
echo "\$CI_JOB_STATUS == $CI_JOB_STATUS"
[ "${CI_JOB_STATUS:-canceled}" == "canceled" ] && exit 0
echo "this should not print either"
interruptible: true
Actual behavior
semgrep-mr continues execution even though UI says it's canceling. "this should only print if the job succeeds, which it shouldn't" and "this should not print either" show up in the job log. Job log doesn't show anywhere that the job was canceled.
Expected behavior
Job log should only say that it got canceled and nothing should print there.
Relevant logs and/or screenshots
job log
[0KRunning with gitlab-runner 17.0.0 (44feccdf)[0;m
[0K on Shared runner g-JwDap9, system ID: r_g3Tb5m4hnDY2[0;m
[0K[36;1mResolving secrets[0;m[0;m
section_start:1721301512:prepare_executor
[0K[0K[36;1mPreparing the "kubernetes" executor[0;m[0;m
[0KUsing Kubernetes namespace: gitlab-shared-runner[0;m
[0KUsing Kubernetes executor with image returntocorp/semgrep ...[0;m
[0KUsing attach strategy to execute scripts...[0;m
section_end:1721301512:prepare_executor
[0Ksection_start:1721301512:prepare_script
[0K[0K[36;1mPreparing environment[0;m[0;m
[0KUsing FF_USE_POD_ACTIVE_DEADLINE_SECONDS, the Pod activeDeadlineSeconds will be set to the job timeout: 1h0m0s...[0;m
Waiting for pod gitlab-shared-runner/runner-g-jwdap9-project-1-concurrent-0-lj5t08e1 to be running, status is Pending
Running on runner-g-jwdap9-project-1-concurrent-0-lj5t08e1 via gitlab-shared-runner-gitlab-runner-557c9cbbb-qlggg...
section_end:1721301516:prepare_script
[0Ksection_start:1721301516:get_sources
[0K[0K[36;1mGetting source from Git repository[0;m[0;m
[32;1mFetching changes with git depth set to 10...[0;m
Initialized empty Git repository in /builds/custobar/custobar/.git/
[32;1mCreated fresh repository.[0;m
[32;1mChecking out d451d77d as detached HEAD (ref is repro-canceled-job-after-script-bug)...[0;m
[32;1mSkipping Git submodules setup[0;m
section_end:1721301522:get_sources
[0Ksection_start:1721301522:step_script
[0K[0K[36;1mExecuting "step_script" stage of the job script[0;m[0;m
[32;1m$ set -euo pipefail[0;m
[32;1m$ if ! sleep 50; then # collapsed multi-line command[0;m
[32;1m$ echo "this should only print if the job succeeds, which it shouldn't"[0;m
this should only print if the job succeeds, which it shouldn't
section_end:1721301573:step_script
[0Ksection_start:1721301573:after_script
[0K[0K[36;1mRunning after_script[0;m[0;m
[32;1mRunning after script...[0;m
[32;1m$ echo "\$CI_JOB_STATUS == $CI_JOB_STATUS" # collapsed multi-line command[0;m
$CI_JOB_STATUS == success
this should not print either
section_end:1721301573:after_script
[0Ksection_start:1721301573:cleanup_file_variables
[0K[0K[36;1mCleaning up project directory and file based variables[0;m[0;m
section_end:1721301574:cleanup_file_variables
[0K[32;1mJob succeeded[0;m
Environment description
Kubernetes executor on a self hosted Gitlab instance.
config.toml contents
config.template.toml
[[runners]]
[runners.cache]
Type = "gcs"
Path = ""
Shared = true
[runners.cache.gcs]
BucketName = "not-relevant-here"
config.toml
shutdown_timeout = 0
concurrent = 4
check_interval = 1
log_level = "info"
Used GitLab Runner version
Running with gitlab-runner 17.0.0 (44feccdf)
on Shared runner g-JwDap9, system ID: r_g3Tb5m4hnDY2
Resolving secrets
Preparing the "kubernetes" executor
Using Kubernetes namespace: gitlab-shared-runner
Using Kubernetes executor with image returntocorp/semgrep ...