Kubernetes executor loses pods when restarting

Summary

When using Kubernetes executor, whenever the runner is restart, jobs will be left lying around.

  • GitLab things the jobs are still running and will do so until you cancel the job (even the timeout doesnt work)
  • This causes all your pipelines to stall
  • The pods are still running in the cluster
  • The pods dont seem to ever quit, as the job isnt running as the main process in the pod

Steps to reproduce

  1. Runner kubernetes runner
  2. Start some jobs
  3. Restart the runner (i.e. by deleting its pod or updating the deployment yaml)

Actual behavior

  • Pods that your jobs are running in continue to run indefinately
  • GitLab things the jobs are still running

Expected behavior

  • Pods cleaned up
  • Jobs failed

Or even better

  • Runner reconnects to running jobs

Environment description

Self hosted GitLab (omnibus) and runner in GKE cluster.

the cluster uses Pre-emptible nodes so the runner is restarted at least every 24h. So far i've been changing the config quite a bit, and everytime i do so i have to mop up the broken jobs.

Runner installed using GitLab Kubernetes integration, although I've now updating it to add a cache config and use newer runner.

Used GitLab Runner version

gitlab/gitlab-runner:alpine-v10.6.0

Edited Apr 19, 2018 by Fred Cox
Assignee Loading
Time tracking Loading