Send `SIGTERM` then `SIGKILL` to process in Shell executor
add-process-group-to-process-pkg
until !1743 (merged) is merged
What does this MR do?
When the job is canceled first send SIGTERM
and then send SIGKILL
after a specific timeout. This uses the same package that is used for the custom executor.
Why was this MR needed?
Looking at #3376 (closed) most processes end up hanging when being killed directly, especially if it has child processes. Implement a graceful shutdown termination for both Windows/Unix systems behind a feature flag FF_SHELL_EXECUTOR_USE_LEGACY_PROCESS_KILL
which is turned off by default (so the new graceful termination is turned on by default).
Testing
Unix
-
Download and compile the following go applicaiton
-
Configure shell executor
config.toml
[[runners]] name = "steve-mbp-gitlab.local" url = "http://192.168.1.79:3000/" token = "xxxx" executor = "shell"
-
Have the following
.gitlab-ci.yml
.gitlab-ci.ymljob: script: - /Users/steve/Code/gitlab.com/steveazz/siglistener/siglistener
-
Run job
-
Run
ps -opid,pgid,command
which will show all the Running processes, and it's process groupsps -opid,pgid,command
47098 47098 go run main.go run -c /Users/steve/Code/gitlab.com/gitlab-org/gitlab/gitlab-runner-config.toml 47146 47098 /var/folders/h4/45_ps9bn4jb6w51g1ngy_n5c0000gn/T/go-build653627740/b001/exe/main run -c /Users/steve/Code/gitlab.com/gitlab-org/gitlab/gitlab-runner-config.toml 47289 47289 bash --login <--- In it's own process group 47293 47289 bash --login 47294 47289 /Users/steve/Code/gitlab.com/steveazz/siglistener/siglistener 47295 47289 sleep 60 47296 47289 sleep 60 47297 47289 sleep 60 47298 47289 sleep 60 47299 47289 sleep 60 47300 47289 sleep 60 47301 47289 sleep 60 47302 47289 sleep 60 47303 47289 sleep 60 47304 47289 sleep 60
-
Run
tail -f /tmp/siglistener.log
-
Cancel Job
-
Look at the
/tmp/siglistener.log
file which will printlogs
2020/01/28 11:34:20 Received the following signal: terminated 2020/01/28 11:34:21 tick 2020/01/28 11:34:22 tick 2020/01/28 11:34:23 tick
-
Wait 10 miniutes and process and child processes are killed
Windows
-
Download and compile the following go applicaiton
-
Configure shell executor
config.toml
[[runners]] name = "steve-mbp-gitlab.local" url = "http://192.168.1.79:3000/" token = "xxxx" executor = "shell"
-
Have the following
.gitlab-ci.yml
.gitlab-ci.ymljob: script: - C:\GitLab-Runner\builds\siglistener.exe
-
Run job
-
Run
gwmi win32_process | % { "$($_.ProcessID) $($_.ParentProcessID) $($_.GetOwner().User) $($_.CommandLine)" }
which will print all the processprocess
940 1904 Administrator "C:\GitLab-Runner\out\binaries\gitlab-runner-windows-amd64.exe" run -c .\config.toml 1476 940 Administrator powershell -noprofile -noninteractive -executionpolicy Bypass -command C:\Users\ADMINI~1\AppData\Local\Temp\2\build_script155394518\script.ps1 2892 1476 Administrator "C:\GitLab-Runner\builds\siglistener.exe"
-
Cancel job
-
Run
gwmi win32_process | % { "$($_.ProcessID) $($_.ParentProcessID) $($_.GetOwner().User) $($_.CommandLine)" }
and notice how1476 940 Administrator powershell -noprofile -noninteractive -executionpolicy Bypass -command C:\Users\ADMINI~1\AppData\Local\Temp\2\build_script155394518\script.ps1
is not there anymore but2892 1476 Administrator "C:\GitLab-Runner\builds\siglistener.exe"
is. -
Wait 10 miniutes and the process should be killed.
Are there points in the code the reviewer needs to double check?
Does this MR meet the acceptance criteria?
-
Documentation created/updated -
Added tests for this feature/bug -
In case of conflicts with master
- branch was rebased
What are the relevant issue numbers?
- Broken out of !1551 (closed)
- closes #3376 (closed)
- closes #4438 (closed)
dev log
- 2019-01-16: Initial push to use a feature flag and use the process termination/killing
- 2019-01-20: Refactoring and additional testing
- 2019-01-22: Add tests
- 2019-01-27: Fix windows tests and add documentaiton
- 2019-01-28: Full test coverage