Skip to content

gitlab-runner graceful termination

Adrien Kohlbecker requested to merge ak/graceful-termination into main

What does this MR do?

Ensure that when the macos-internal-runner ASG needs to renew instances, it happens without killing running jobs.

  1. Set up a lifecycle hook on the ASG. When it wants to terminate an instance, it puts it into the Terminating:Wait state.
  2. A process running on the instance monitors the target state, and when it changes to Terminated, it starts graceful shutdown
  3. This will gracefully kill the runner and unregister it from GitLab
  4. At the end, it tells the ASG to complete the lifecycle, this puts the instance in the Terminating:Proceed state, which AWS can then terminate

Because gitlab-runner expects SIGQUIT for graceful termination, and launchd is hardcoded to kill processes with SIGTERM, I wrap gitlab-runner with dumb-init and use its signal rewriting capability.

Note: dumb-init is not released for darwin arm64, I used pip install dumb-init to get it built, and I'm including it in the repo so we can avoid setting up a Python environment for the AMI.

Why was this MR needed?

Deploying new AMIs killed running jobs

What's the best way to test this MR?

Initiate an instance refresh while a job is running, the job should continue

What are the relevant issue numbers?

https://gitlab.com/gitlab-org/ci-cd/shared-runners/macos/-/issues/20

Edited by Adrien Kohlbecker

Merge request reports