Skip to content

Long running jobs canceled in GitLab UI, but runner continues process

Note (revised 2022-04-28)

If you are still experiencing similar issues as described in this issue, then add a comment with your issue details to the CI process does not receive SIGTERM on termination issue.

Overview

I have a long running compile job (~40 minutes). I made 2 pushes one after another. I stopped the first running job with the ui. It tells me that the job is canceled. But the second job stays at pending.

I suspect that the runner finishes the job and is not properly terminated. Is there a way to test my hypothesis?

I'm using the shell executor for the runner (gitlab and runner are on ubuntu 16.04)

Edit: as written in a comment below steps to reproduce the problem:

create a project with a simple gitlab-ci.yml file:

build:
  stage: build
  tags:
    - ubuntu_amd64
  script: 
    - ping localhost

start a pipeline and cancel it. This should also terminate the ping command (but it doesn't)

on the runner see if the process is still running

ps aux | grep ping
gitlab-+ 19828  0.0  0.0   8656  1724 ?        S    07:59   0:00 ping localhost

or just kill it with killall ping (use sudo if the runner is under another user)

Proposal

At the moment we are simply killing the process group with SIGKILL and then ignore the result. Instead of doing this we should allow the process to gracefully shutdown by first sending SIGTERM and after a specific timeout send SIGKILL to the process. This will help with the processes being killed properly. We already have this implemented with the custom executor and should try and reuse the code to implement the same feature.

Merge Requests

  1. Extract process killer form custom executor
  2. Extract commander interface from custom executor
  3. Add Process groups to process pkg
  4. Use the same termination commands on Windows
    • For windows on the shell executor, we pass taskkil while in the process package we just call process.Kill() investigate which one is better or if we should use both.
  5. Rename test file
  6. Send SIGTERM then SIGKILL for shell executor

Original merge request

Edited by Darren Eastman