Skip to content

Allow graceful termination on Windows

What does this MR do?

Ensures Runner can exit correctly on Windows.

We only treat syscall.SIGQUIT as the the graceful shutdown command and other signals (SIGTERM, SIGINT) as a forceful shutdown.

Ideally, we would attempt a graceful shutdown in all cases and upon receiving a second signal, do so forcefully. However, that could be a breaking change.

On Windows however, the os/signal package translates CTRL+C, CTRL+BREAK and CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT, CTRL_SHUTDOWN_EVENT to SIGTERM and SIGINT, and both graceful and forceful shutdowns are effectively broken and do not work. A change of behaviour here to always enforcing a graceful shutdown first is okay, as the previous behaviour was entirely broken.

This translates the signals received on Windows (SIGTERM and SIGINT) to a SIGQUIT internally. This also works for Runner when installed as a Windows Service.

Why was this MR needed?

Exiting on Windows is difficult.

What's the best way to test this MR?

Without this change applied, on Windows:

./gitlab-runner.exe --debug run

As it looks to pick up jobs, hit Ctrl+C:

The following will happen:

WARNING: Graceful shutdown not finished properly    builds=0 error=received: <nil>
WARNING: Starting forceful shutdown                 StopSignal=<nil> builds=0
Broadcasting job abort signal                       builds=0
Broadcasting interrupt signal                       builds=0
Feeding runners to channel                          builds=0
WARNING: Forceful shutdown not finished properly    builds=0 error=shutdown timed out
FATAL: Service run failed                           error=shutdown timed out

This takes a long time to complete, and eventually results in a shutdown timed out.

Now with this MR and change applied, the same should result in:

WARNING: Starting graceful shutdown, waiting for builds to finish  StopSignal=quit builds=0
Broadcasting interrupt signal                       builds=0
All workers stopped. Can exit now                   builds=0

Having the runner pick up a long running job, should require two CTRL+C attempts. One will start out graceful, a second will be forceful, but will no longer timeout and function correctly.

What are the relevant issue numbers?

closes #4173 (closed)

Edited by Arran Walker

Merge request reports