Skip to content

Fix non terminating runner in register loop

Clemens Beck requested to merge fix-non-terminating-runner-in-register-loop into main

What does this MR do?

Fix non terminating runner in register loop

Unregistered runners asked to shut down (e.g. via kubectl delete ...)
get stuck in "terminating" because the signal will not be processed
until the register script (with 30 retries) completes.

Fixed by trapping the SIGINT and SIGQUIT signals.

Changelog: fixed

Why was this MR needed?

This fixes a problem often seen in GitLab chart pipelines, where jobs wait a very long time for a runner pod to finish.

What's the best way to test this MR?

  1. Install a stable Helm release where a runner can not register (e.g. with a wrong register token)
  2. Run kubectl delete pod <runner_pod> to delete the runner
  3. Confirm the runner is stuck in "Terminating"
  4. Upgrade to this branch
  5. Run kubectl delete pod <runner_pod> to delete the runner
  6. Confirm the runner terminates

Note: dumb-init's verbose mode (--verbose) might be helpful.

What are the relevant issue numbers?

Edited by Clemens Beck

Merge request reports