Skip to content

Orphaned processes send the runner into a death spiral (in limited circumstances)

Summary

Using the powershell executor on Windows Server Core on EC2, jobs that create orphaned processes make the runner permanently unavailable.

This might have broader impact, put powershell on Windows Server Core on EC2 is our basic repro case.

Steps to reproduce

  1. Create a Windows Server Core instance on EC2
  2. Open a powershell window
  • Press ctrl+alt+delete
  • Open task manager
  • Click File -> Run new task
  • Type in powershell.exe and click ok
  1. Install git with chocolatey
  • Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))
  • choco install git
  • ${env:PATH} = "${env:PATH};C:\Program Files\Git\bin;"
  1. Download the gitlab-runner exe
  • wget -UseBasicParsing -outfile gitlab-runner.exe https://gitlab-runner-downloads.s3.amazonaws.com/v10.4.0/binaries/gitlab-runner-windows-amd64.exe
  1. Write the config.toml at C:\config.toml (example provided later)
  2. Start the gitlab runner
  • C:\gitlab-runner.exe -l debug run -c C:\config.toml
  1. Run a job on the runner with this build script: Start-Process -NoNewWindow ping "/t www.google.com"

Config.toml:

concurrent = 1
check_interval = 0
metrics_server = ":8080"

[[runners]]
  name = "windows-test"
  url = "<server url>"
  token = "<runner token>"
  executor = "shell"
  shell = "powershell"
  builds_dir = "C:\\builds"
  cache_dir = "C:\\cache"

Actual behavior

The job times out. After the job times out or is canceled, the runner gets stuck at Aborting command... until the orphaned process is killed.

If you download and run process explorer, you can see that the gitlab-runner process has no children while it is stuck. The build script process has exited, but cmd.Wait() hangs until the orphaned processes also exit.

Expected behavior

The job should complete immediately after the build script process exits.

Environment description

We are hosting our own runners in AWS.

Used GitLab Runner version

Version:      10.4.0
Git revision: 857480b6
Git branch:   10-4-stable
GO version:   go1.8.5
Built:        Mon, 22 Jan 2018 09:48:23 +0000
OS/Arch:      windows/amd64

Proposal

Update killer_windows methods Terminate and ForceKill to send the correct signals:

  1. Terminate: Should send GenerateConsoleCtrlEvent similar to what buildkite is doing.
  2. ForceKill: Send force kill signal taskkill /F
  3. Investigate DETACHED_PROCESS !1797 (comment 297041460)

Customers

https://gitlab.my.salesforce.com/00161000004bZxf

Edited by Darren Eastman