Skip to content

Terminating or interrupting a Runner Manager K8s pod causes runner worker pods and jobs to be orphaned and to exceed the defined job timeout

Summary

If the k8s runner pod that manages the ephemeral job is restarted, that results in the ephemeral runner worker pod to be orphaned. Orphaned means the worker pod does not get cleaned up even the job exits, therefore the ephemeral runner worker pod is running on the node and consuming resources

Steps to reproduce

  • Have a runner pod (runner-1) with a label (k8s-tiny) registered in gitlab.com with some 12345 runner id Run a Job-1 with x min timeout with k8s-tiny tag.
  • Assume job-1 lands on ephemeral-1 pod belonging to runner-1 Delete this runner-1 pod from k8s.
  • a new runner-1 pod get created but on gitlab.com, this new runner pod will have a different runner ID.
  • Define timeout: keyword in job for 1 minute
  • Start pipeline and terminate pod while it is running a job
  • Job continues running beyond the 1 minute timeout, even though the pod was terminated
  • Waiting 100 minutes later, the job will be marked as failed

Example Project

Example job: https://gitlab.com/jdasmarinas/gitlab-runner-secrets-test/-/jobs/3725538578

What is the current bug behavior?

Job does not respect timeout if pod is terminated just after the job has started.

What is the expected correct behavior?

The job should be marked as failed in GitLab based on the timeout defined and should not wait for an update from the runner.

Relevant logs and/or screenshots

Output of checks

/label reproduced on GitLab.com

Results of GitLab environment info

Expand for output related to GitLab environment info

(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:check SANITIZE=true)

(For installations from source run and paste the output of: sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)

(we will only investigate if the tests are passing)

Possible fixes

Edited by Darren Eastman