Skip to content

Make commander start process group for each process

Steve Xuereb requested to merge add-process-group-to-process-pkg into master

👋 Target branch is extract-comamnder-from-custom-executor untill !1654 (merged) is merged 👋

What does this MR do?

Have the commander create a process group for every command it starts. This is so that each subprocess has the PPID set to the process that the custom executor start which will help with process termination. More information about process groups can be found in http://www.informit.com/articles/article.aspx?p=397655&seqNum=6.

Update the unix killer to be aware of process groups and send the singal to each process instead of just the main process to prevent any orphan processes.

How to test

Simple go project that listens to SIGTERM/SIGKILL and spawns child process

.gitlab-ci.yaml
job:
  script:
  - /Users/steve/Code/gitlab.com/steveazz/siglistener/siglistener
config.toml
[[runners]]
  name = "custom"
  url = "http://192.168.1.79:3000"
  token = "PtUw_SJzn2y--tEKUvJw"
  executor = "custom"
  builds_dir = "/tmp/builds"
  cache_dir = "/tmp/cache"
  [runners.custom]
    run_exec = "/Users/steve/Code/gitlab.com/custom-executor-drivers/debug/run.sh"

    graceful_kill_timeout = 20
run.sh
#!/usr/bin/env bash

/bin/bash $1

Shell executor

Running the Runner binary locally with go run main.go

ps -opid,pgid,command
$ ps -opid,pgid,command 

24415 24415 go run main.go run -c /Users/steve/Code/gitlab.com/gitlab-org/gitlab/gitlab-runner-config.toml                                                                     <--- Runner process
24443 24415 /var/folders/h4/45_ps9bn4jb6w51g1ngy_n5c0000gn/T/go-build583985105/b001/exe/main run -c /Users/steve/Code/gitlab.com/gitlab-org/gitlab/gitlab-runner-config.toml
27195 27195 bash --login                                                                                                                                                       <--- Script from shell executor has it's own process group
27199 27195 bash --login
27200 27195 /Users/steve/Code/gitlab.com/steveazz/siglistener/siglistener
27204 27195 (sleep)
27205 27195 (sleep)
27206 27195 (sleep)
27207 27195 (sleep)
27208 27195 (sleep)
27209 27195 (sleep)
27210 27195 (sleep)
27211 27195 (sleep)
27212 27195 (sleep)
27213 27195 (sleep)

Steps

  1. Start job
  2. Run ps -opid,pgid,command same user as Runner to see the process
  3. Cancel job
  4. Just SIGKILl is sent to the siglistener and sub-processes.

Custom executor (master)

Notice how the process group of the siglistener and sleep are all for the go run main.go the process not /Users/steve/Code/gitlab.com/steveazz/siglistener/siglistener as how it would be for the shell executor.

ps -opid,pgid,command
$ ps -opid,pgid,command 

16934 16934 go run main.go run -c /Users/steve/Code/gitlab.com/gitlab-org/gitlab/gitlab-runner-config.toml                                                                                          <--- Runner process
16962 16934 /var/folders/h4/45_ps9bn4jb6w51g1ngy_n5c0000gn/T/go-build263333944/b001/exe/main run -c /Users/steve/Code/gitlab.com/gitlab-org/gitlab/gitlab-runner-config.toml
17164 16934 bash /Users/steve/Code/gitlab.com/custom-executor-drivers/debug/run.sh /var/folders/h4/45_ps9bn4jb6w51g1ngy_n5c0000gn/T/custom-executor808275188/script062524327/script. build_script   <--- Exec of build script still attached to the main runner process
17165 16934 /bin/bash /var/folders/h4/45_ps9bn4jb6w51g1ngy_n5c0000gn/T/custom-executor808275188/script062524327/script.
17167 16934 /bin/bash /var/folders/h4/45_ps9bn4jb6w51g1ngy_n5c0000gn/T/custom-executor808275188/script062524327/script.
17168 16934 /Users/steve/Code/gitlab.com/steveazz/siglistener/siglistener
17169 16934 sleep 60
17170 16934 sleep 60
17171 16934 sleep 60
17172 16934 sleep 60
17173 16934 sleep 60
17174 16934 sleep 60
17175 16934 sleep 60
17176 16934 sleep 60
17177 16934 sleep 60
17178 16934 sleep 60
  1. Start job
  2. Run ps -opid,pgid,command same user as Runner to see the process
  3. Cancel job
  4. SIGTERM is sent to the run.sh process and it's terminated properly but all other processes are still around since the GPID is set to the Runner process

Custom executor (this MR)

Notice how the process group is now set to the siglistener for the child processes.

ps -opid,pgid,command
$ ps -opid,pgid,command

66515 66515 go run main.go run -c /Users/steve/Code/gitlab.com/gitlab-org/gitlab/gitlab-runner-config.toml                                                                                          <--- Runner process
66991 66515 /var/folders/h4/45_ps9bn4jb6w51g1ngy_n5c0000gn/T/go-build545902909/b001/exe/main run -c /Users/steve/Code/gitlab.com/gitlab-org/gitlab/gitlab-runner-config.toml
67086 67086 bash /Users/steve/Code/gitlab.com/custom-executor-drivers/debug/run.sh /var/folders/h4/45_ps9bn4jb6w51g1ngy_n5c0000gn/T/custom-executor612101401/script079237912/script. build_script   <--- run.sh set as progress group
67087 67086 /bin/bash /var/folders/h4/45_ps9bn4jb6w51g1ngy_n5c0000gn/T/custom-executor612101401/script079237912/script.
67089 67086 /bin/bash /var/folders/h4/45_ps9bn4jb6w51g1ngy_n5c0000gn/T/custom-executor612101401/script079237912/script.
67090 67086 /Users/steve/Code/gitlab.com/steveazz/siglistener/siglistener
67091 67086 (sleep)
67092 67086 (sleep)
67093 67086 (sleep)
67094 67086 (sleep)
67095 67086 (sleep)
67096 67086 (sleep)
67097 67086 (sleep)
67098 67086 (sleep)
67099 67086 (sleep)
67100 67086 (sleep)

Steps

  1. Start job
  2. Run ps -opid,pgid,command same user as Runner to see the process
  3. Cancel job
  4. Run tail -f /tmp/siglistener.log to see logs of the siglistener. In the logs there should be a message saying that SIGTERM was recived, and after 20 tick the process is killed. See we received a SIGTERM, and after 20 ticks the process was killed
  5. Check if process tree is cleaned up with ps -opid,pgid,command

Why was this MR needed?

  • Prevent subprocesses of the custom executor to live forever/until the main Runner process exits
  • Consolidated the shell executor and custom executor process logic together

👋 When reviewing this merge request take a look at !1551 (closed) 👋

Does this MR meet the acceptance criteria?

  • Documentation created/updated
  • Added tests for this feature/bug
  • In case of conflicts with master - branch was rebased

What are the relevant issue numbers?

Edited by Steve Xuereb

Merge request reports