Skip to content

Send `SIGTERM` then `SIGKILL` to process in Shell executor

Steve Xuereb requested to merge shell-executor-use-process-pkg into master

👋 Target branch for now is add-process-group-to-process-pkg until !1743 (merged) is merged 👋

What does this MR do?

When the job is canceled first send SIGTERM and then send SIGKILL after a specific timeout. This uses the same package that is used for the custom executor.

Why was this MR needed?

Looking at #3376 (closed) most processes end up hanging when being killed directly, especially if it has child processes. Implement a graceful shutdown termination for both Windows/Unix systems behind a feature flag FF_SHELL_EXECUTOR_USE_LEGACY_PROCESS_KILL which is turned off by default (so the new graceful termination is turned on by default).

Testing

Unix

  1. Download and compile the following go applicaiton

  2. Configure shell executor

    config.toml
    [[runners]]
      name = "steve-mbp-gitlab.local"
      url = "http://192.168.1.79:3000/"
      token = "xxxx"
      executor = "shell"
  3. Have the following .gitlab-ci.yml

    .gitlab-ci.yml
    job:
     script:
     - /Users/steve/Code/gitlab.com/steveazz/siglistener/siglistener
  4. Run job

  5. Run ps -opid,pgid,command which will show all the Running processes, and it's process groups

    ps -opid,pgid,command
    47098 47098 go run main.go run -c /Users/steve/Code/gitlab.com/gitlab-org/gitlab/gitlab-runner-config.toml
    47146 47098 /var/folders/h4/45_ps9bn4jb6w51g1ngy_n5c0000gn/T/go-build653627740/b001/exe/main run -c /Users/steve/Code/gitlab.com/gitlab-org/gitlab/gitlab-runner-config.toml
    47289 47289 bash --login                                                                                                                                                      <--- In it's own process group
    47293 47289 bash --login
    47294 47289 /Users/steve/Code/gitlab.com/steveazz/siglistener/siglistener                 
    47295 47289 sleep 60
    47296 47289 sleep 60
    47297 47289 sleep 60
    47298 47289 sleep 60
    47299 47289 sleep 60
    47300 47289 sleep 60
    47301 47289 sleep 60
    47302 47289 sleep 60
    47303 47289 sleep 60
    47304 47289 sleep 60
  6. Run tail -f /tmp/siglistener.log

  7. Cancel Job

  8. Look at the /tmp/siglistener.log file which will print

    logs
    2020/01/28 11:34:20 Received the following signal: terminated
    2020/01/28 11:34:21 tick
    2020/01/28 11:34:22 tick
    2020/01/28 11:34:23 tick
    
  9. Wait 10 miniutes and process and child processes are killed

Windows

  1. Download and compile the following go applicaiton

  2. Configure shell executor

    config.toml
    [[runners]]
      name = "steve-mbp-gitlab.local"
      url = "http://192.168.1.79:3000/"
      token = "xxxx"
      executor = "shell"
  3. Have the following .gitlab-ci.yml

    .gitlab-ci.yml
    job:
      script:
      - C:\GitLab-Runner\builds\siglistener.exe
  4. Run job

  5. Run gwmi win32_process | % { "$($_.ProcessID) $($_.ParentProcessID) $($_.GetOwner().User) $($_.CommandLine)" } which will print all the process

    process
    940 1904 Administrator "C:\GitLab-Runner\out\binaries\gitlab-runner-windows-amd64.exe" run -c .\config.toml
    1476 940 Administrator powershell -noprofile -noninteractive -executionpolicy Bypass -command C:\Users\ADMINI~1\AppData\Local\Temp\2\build_script155394518\script.ps1
    2892 1476 Administrator "C:\GitLab-Runner\builds\siglistener.exe"
  6. Cancel job

  7. Run gwmi win32_process | % { "$($_.ProcessID) $($_.ParentProcessID) $($_.GetOwner().User) $($_.CommandLine)" } and notice how 1476 940 Administrator powershell -noprofile -noninteractive -executionpolicy Bypass -command C:\Users\ADMINI~1\AppData\Local\Temp\2\build_script155394518\script.ps1 is not there anymore but 2892 1476 Administrator "C:\GitLab-Runner\builds\siglistener.exe" is.

  8. Wait 10 miniutes and the process should be killed.

Are there points in the code the reviewer needs to double check?

👋 When reviewing this merge request take a look at !1551 (closed) 👋

Does this MR meet the acceptance criteria?

  • Documentation created/updated
  • Added tests for this feature/bug
  • In case of conflicts with master - branch was rebased

What are the relevant issue numbers?


dev log
Edited by Steve Xuereb

Merge request reports