gitlab-runner graceful termination
What does this MR do?
Ensure that when the macos-internal-runner
ASG needs to renew instances, it happens without killing running jobs.
- Set up a lifecycle hook on the ASG. When it wants to terminate an instance, it puts it into the
Terminating:Wait
state. - A process running on the instance monitors the target state, and when it changes to
Terminated
, it starts graceful shutdown - This will gracefully kill the runner and unregister it from GitLab
- At the end, it tells the ASG to complete the lifecycle, this puts the instance in the
Terminating:Proceed
state, which AWS can then terminate
Because gitlab-runner
expects SIGQUIT for graceful termination, and launchd
is hardcoded to kill processes with SIGTERM, I wrap gitlab-runner
with dumb-init
and use its signal rewriting capability.
Note: dumb-init is not released for darwin arm64, I used pip install dumb-init
to get it built, and I'm including it in the repo so we can avoid setting up a Python environment for the AMI.
Why was this MR needed?
Deploying new AMIs killed running jobs
What's the best way to test this MR?
Initiate an instance refresh while a job is running, the job should continue
What are the relevant issue numbers?
https://gitlab.com/gitlab-org/ci-cd/shared-runners/macos/-/issues/20
Edited by Adrien Kohlbecker