Skip to content

Don't use SIGKILL on su process of shell executor

What does this MR do?

Fixes a terrible problem that leaves my builds running after I've cancelled them from the the web gui. In my case, these are OpenWRT builds that can take 8+ hours.

Why was this MR needed?

Because the current implementation is broken and it will always or almost always leave child processes running because they never get HUP.

Are there points in the code the reviewer needs to double check?

I think the code should be immediately committed as-is. However, somebody who knows golang and gitlab-runner better should revise the way it cancels processes spawned by the shell executor. Specifically:

  1. Send INT or TERM.
  2. Join on the child process with a timeout of x milliseconds where x is either configurable or 1000.
  3. If the child process is still alive go ahead and send KILL. Even if the child(ren) are still working on stopping (possibly tearing down docker containers, etc.) they will have almost certainly gotten the message by that time.

Does this MR meet the acceptance criteria?

  • [?] Documentation created/updated
  • Added tests for this feature/bug
  • In case of conflicts with master - branch was rebased

What are the relevant issue numbers?

Closes #3101 (closed)

Merge request reports