Automatically canceled job continues execution and fails

Summary

When a job gets canceled mid run due to some other job failing and workflow.auto_cancel.on_job_failure being set to all, the script isn't always canceled cleanly. The currently running command in script fails and after_script is executed as if the job failed, even though UI correctly says that the job was canceled.

Steps to reproduce

In our case a lint and sempgrep jobs are running concurrently. If lint fails it cancels semgrep job while it is running but the script continues execution, trying to grep the output file that never gets generated.

The minimal config here isn't 1:1 with the problem because I couldn't manage to reproduce it but it still demonstrates how the job doesn't cancel properly but continues execution.

.gitlab-ci.yml
workflow:
  auto_cancel:
    on_job_failure: all

stages:
  - lint

lint:
  stage: lint
  image: registry.gitlab.com/pipeline-components/ruff:0.18.0
  script: 
    - sleep 5
    - exit 1
  interruptible: true

semgrep-mr:
  stage: lint
  image: returntocorp/semgrep
  script:
    # this is just an attempt at simulating the problem. the actual script is something akin to
    # - |
    #   if ! semgrep ci --verbose --no-suppress-errors --output semgrep.xml --junit-xml; then
    #     grep "failure type" semgrep.xml
    #     exit 1
    #   fi
    - set -euo pipefail
    - |
      if ! sleep 50; then
        echo "this should not print"
        exit 1
      fi
    - echo "this should only print if the job succeeds, which it shouldn't"
  after_script:
    - |
      echo "\$CI_JOB_STATUS == $CI_JOB_STATUS"
      [ "${CI_JOB_STATUS:-canceled}" == "canceled" ] && exit 0
      echo "this should not print either"
  interruptible: true

Actual behavior

semgrep-mr continues execution even though UI says it's canceling. "this should only print if the job succeeds, which it shouldn't" and "this should not print either" show up in the job log. Job log doesn't show anywhere that the job was canceled.

Expected behavior

Job log should only say that it got canceled and nothing should print there.

Relevant logs and/or screenshots

job log
[0KRunning with gitlab-runner 17.0.0 (44feccdf)[0;m
[0K  on Shared runner g-JwDap9, system ID: r_g3Tb5m4hnDY2[0;m
[0K[36;1mResolving secrets[0;m[0;m
section_start:1721301512:prepare_executor
[0K[0K[36;1mPreparing the "kubernetes" executor[0;m[0;m
[0KUsing Kubernetes namespace: gitlab-shared-runner[0;m
[0KUsing Kubernetes executor with image returntocorp/semgrep ...[0;m
[0KUsing attach strategy to execute scripts...[0;m
section_end:1721301512:prepare_executor
[0Ksection_start:1721301512:prepare_script
[0K[0K[36;1mPreparing environment[0;m[0;m
[0KUsing FF_USE_POD_ACTIVE_DEADLINE_SECONDS, the Pod activeDeadlineSeconds will be set to the job timeout: 1h0m0s...[0;m
Waiting for pod gitlab-shared-runner/runner-g-jwdap9-project-1-concurrent-0-lj5t08e1 to be running, status is Pending
Running on runner-g-jwdap9-project-1-concurrent-0-lj5t08e1 via gitlab-shared-runner-gitlab-runner-557c9cbbb-qlggg...

section_end:1721301516:prepare_script
[0Ksection_start:1721301516:get_sources
[0K[0K[36;1mGetting source from Git repository[0;m[0;m
[32;1mFetching changes with git depth set to 10...[0;m
Initialized empty Git repository in /builds/custobar/custobar/.git/
[32;1mCreated fresh repository.[0;m
[32;1mChecking out d451d77d as detached HEAD (ref is repro-canceled-job-after-script-bug)...[0;m

[32;1mSkipping Git submodules setup[0;m

section_end:1721301522:get_sources
[0Ksection_start:1721301522:step_script
[0K[0K[36;1mExecuting "step_script" stage of the job script[0;m[0;m
[32;1m$ set -euo pipefail[0;m
[32;1m$ if ! sleep 50; then # collapsed multi-line command[0;m
[32;1m$ echo "this should only print if the job succeeds, which it shouldn't"[0;m
this should only print if the job succeeds, which it shouldn't

section_end:1721301573:step_script
[0Ksection_start:1721301573:after_script
[0K[0K[36;1mRunning after_script[0;m[0;m
[32;1mRunning after script...[0;m
[32;1m$ echo "\$CI_JOB_STATUS == $CI_JOB_STATUS" # collapsed multi-line command[0;m
$CI_JOB_STATUS == success
this should not print either

section_end:1721301573:after_script
[0Ksection_start:1721301573:cleanup_file_variables
[0K[0K[36;1mCleaning up project directory and file based variables[0;m[0;m

section_end:1721301574:cleanup_file_variables
[0K[32;1mJob succeeded[0;m

Environment description

Kubernetes executor on a self hosted Gitlab instance.

config.toml contents
config.template.toml
[[runners]]
  [runners.cache]
    Type = "gcs"
    Path = ""
    Shared = true
    [runners.cache.gcs]
      BucketName = "not-relevant-here"
config.toml
shutdown_timeout = 0
concurrent = 4
check_interval = 1
log_level = "info"

Used GitLab Runner version

Running with gitlab-runner 17.0.0 (44feccdf)
  on Shared runner g-JwDap9, system ID: r_g3Tb5m4hnDY2
Resolving secrets
Preparing the "kubernetes" executor
Using Kubernetes namespace: gitlab-shared-runner
Using Kubernetes executor with image returntocorp/semgrep ...

Possible fixes