Skip to content

Gitlab-runner - provide a better systemd file, pid-file, dont rely on the inadequate implementation of Go-service-library

The Gitlab-runner (v16.4) RPM package does not come with a system service file. The "install" sub-command will install one via the "Go service library". However, the result of this command is a woefully inadequate systemd file:

  1. Starting of the service does nothing to validate if the service will start correctly or has started correctly.
  2. Stopping of the service will not happen correctly, with the result of a high probability of orphaned docker containers and processes.
  3. As a specific problem with #2, for instance, a DinD configuration will allow containers to spin up outside the gitlab-runner's cgroup. When systemd attempts to stop the service, it will do so by signalling only the "MAINPID" of the gitlab-runner service. The signals in sequence are TERM, HUP, and finally KILL signals. Gitlab-runner shuts down properly only to QUIT. This may/will result in orphaned docker containers or other child processes.
  4. It's not wise to run multiple gitlab-runner services on a single host, independently, as systemd cannot always reliably guess the processes's PID.
  5. No "reload" option is available.

A work-around to some of these issues mentioned in the documentation, but as some kind of footnote. This really ought to be fixed in packaging.

The fault lies either with the "Go service library" implementation, or with gitlab-runner's configuration of said library.

The Service "Type" parameter should be set to "exec", or better yet, "notify" as hinted below. (I'd suggest "dbus", but I don't want to trigger programmer apathy.) If "exec" is not available on the system (eg, RHEL7), then "Type" should be set to "simple" with a "StartExecPost" command (hints below). Alternatively, a start-up script to fork-off the gitlab-runner service, saving the PID in a pid file, and setting the Type=forking, and replacing the ExecStart with the wrapper script.

Then update the documentation at https://docs.gitlab.com/runner/commands/#gitlab-runner-stop-doesnt-shut-down-gracefully . Ironically, a systemd-override file is provided, but without the aforemention options or a packaged systemd.service file, is quite pointless.

What the user expects to see for a misconfigured runner is:

 systemctl start gitlab-runner
Job for gitlab-runner.service failed because the control process exited with error code. See "systemctl status gitlab-runner.service" and "journalctl -xe" for details.

And for a well-configured runner, that systemctl stop gitlab-runner shuts down the service properly.

Hinted override file:

[Service]
Type=simple
TimeoutStartSec=2
ExecStartPost=/bin/sleep 1
ExecStartPost=/usr/bin/gitlab-runner verify
# ^-- 1 second after startup, attempts to verify the runner, non-zero exit code fails the service

ExecReload=/bin/kill -HUP $MAINPID

# For termination: kill the main process, not entire cgroup, allowing proper cleanup
KillMode=process
KillSignal=QUIT
TimeoutStopSec=15
Edited by otheus