RHEL runner silent failure of jobs until timeout

Summary

When using RHEL 7.X runners, jobs would fail silently if the runner encounters any availability issue (e.g. sudden outage, restarts, etc). The job will continue to run until it hits the configured timeout.

Steps to reproduce

  • Create a RHEL 7.x runner.
  • Create a long runner pipeline (e.g. sleep).
  • Run the pipeline.
  • Once the job is running, reboot the runner server.
.gitlab-ci.yml
stages:          # List of stages for jobs, and their order of execution
  - build

build-job:       # This job runs in the build stage, which runs first.
  stage: build
  tags:
    - rhel9
  script:
    - echo "Compiling the code..."
    - sleep 600
    - echo "Compile complete."

Actual behavior

  • Observe that the job continues in the running state and won't prompt any failure message in the logs.

Expected behavior

  • The job should fail properly with the following message.
$ echo "Compiling the code..."
Compiling the code...
$ sleep 600
WARNING: after_script failed, but job will continue unaffected: context canceled
ERROR: Job failed (system failure): aborted: terminated

Relevant logs and/or screenshots

job log
Add the job log

Environment description

config.toml contents
concurrent = 1
check_interval = 0
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "<REDACTED>"
  url = "<REDACTED>"
  id = 21
  token = "<REDACTED>"
  token_obtained_at = 2024-02-02T03:52:39Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker"
  [runners.cache]
    MaxUploadedArchiveSize = 0
  [runners.docker]
    tls_verify = false
    image = "ruby:2.7"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache"]
    shm_size = 0
    network_mtu = 0

Used GitLab Runner version

Version:      16.8.0
Git revision: c72a09b6
Git branch:   16-8-stable
GO version:   go1.21.5
Built:        2024-01-18T22:42:25+0000
OS/Arch:      linux/amd64

Possible fixes

  • Using a more recent RHEL version (RHEL 9.X) seems to have addressed the issue.