Long running jobs canceled in GitLab UI, but runner continues process
Note (revised 2022-04-28)
If you are still experiencing similar issues as described in this issue, then add a comment with your issue details to the CI process does not receive SIGTERM on termination issue.
I have a long running compile job (~40 minutes). I made 2 pushes one after another. I stopped the first running job with the ui. It tells me that the job is canceled. But the second job stays at pending.
I suspect that the runner finishes the job and is not properly terminated.
Is there a way to test my hypothesis?
I'm using the shell executor for the runner (gitlab and runner are on ubuntu 16.04)
Edit: as written in a comment below steps to reproduce the problem:
create a project with a simple
- ping localhost
start a pipeline and cancel it. This should also terminate the ping command (but it doesn't)
on the runner see if the process is still running
ps aux | grep ping
gitlab-+ 19828 0.0 0.0 8656 1724 ? S 07:59 0:00 ping localhost
or just kill it with
killall ping (use
sudo if the runner is under another user)
At the moment we are simply killing the process group with SIGKILL and then ignore the result. Instead of doing this we should allow the process to gracefully shutdown by first sending
SIGTERM and after a specific timeout send
SIGKILL to the process. This will help with the processes being killed properly. We already have this implemented with the custom executor and should try and reuse the code to implement the same feature.
Extract process killer form custom executor
Extract commander interface from custom executor
Add Process groups to
Use the same termination commands on Windows
Rename test file
SIGKILLfor shell executor