Orphaned processes send the runner into a death spiral (in limited circumstances)
Summary
Using the powershell executor on Windows Server Core on EC2, jobs that create orphaned processes make the runner permanently unavailable.
This might have broader impact, put powershell on Windows Server Core on EC2 is our basic repro case.
Steps to reproduce
- Create a Windows Server Core instance on EC2
- Open a powershell window
- Press ctrl+alt+delete
- Open task manager
- Click File -> Run new task
- Type in powershell.exe and click ok
- Install git with chocolatey
Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))
choco install git
${env:PATH} = "${env:PATH};C:\Program Files\Git\bin;"
- Download the gitlab-runner exe
wget -UseBasicParsing -outfile gitlab-runner.exe https://gitlab-runner-downloads.s3.amazonaws.com/v10.4.0/binaries/gitlab-runner-windows-amd64.exe
- Write the config.toml at C:\config.toml (example provided later)
- Start the gitlab runner
C:\gitlab-runner.exe -l debug run -c C:\config.toml
- Run a job on the runner with this build script:
Start-Process -NoNewWindow ping "/t www.google.com"
Config.toml:
concurrent = 1
check_interval = 0
metrics_server = ":8080"
[[runners]]
name = "windows-test"
url = "<server url>"
token = "<runner token>"
executor = "shell"
shell = "powershell"
builds_dir = "C:\\builds"
cache_dir = "C:\\cache"
Actual behavior
The job times out. After the job times out or is canceled, the runner gets stuck at Aborting command...
until the orphaned process is killed.
If you download and run process explorer, you can see that the gitlab-runner process has no children while it is stuck. The build script process has exited, but cmd.Wait() hangs until the orphaned processes also exit.
Expected behavior
The job should complete immediately after the build script process exits.
Environment description
We are hosting our own runners in AWS.
Used GitLab Runner version
Version: 10.4.0
Git revision: 857480b6
Git branch: 10-4-stable
GO version: go1.8.5
Built: Mon, 22 Jan 2018 09:48:23 +0000
OS/Arch: windows/amd64
Proposal
Update killer_windows methods Terminate
and ForceKill
to send the correct signals:
-
Terminate
: Should send GenerateConsoleCtrlEvent similar to what buildkite is doing. -
ForceKill
: Send force kill signaltaskkill /F
- Investigate
DETACHED_PROCESS
!1797 (comment 297041460)