Kubernetes executor loses pods when restarting
Summary
When using Kubernetes executor, whenever the runner is restart, jobs will be left lying around.
- GitLab things the jobs are still running and will do so until you cancel the job (even the timeout doesnt work)
- This causes all your pipelines to stall
- The pods are still running in the cluster
- The pods dont seem to ever quit, as the job isnt running as the main process in the pod
Steps to reproduce
- Runner kubernetes runner
- Start some jobs
- Restart the runner (i.e. by deleting its pod or updating the deployment yaml)
Actual behavior
- Pods that your jobs are running in continue to run indefinately
- GitLab things the jobs are still running
Expected behavior
- Pods cleaned up
- Jobs failed
Or even better
- Runner reconnects to running jobs
Environment description
Self hosted GitLab (omnibus) and runner in GKE cluster.
the cluster uses Pre-emptible nodes so the runner is restarted at least every 24h. So far i've been changing the config quite a bit, and everytime i do so i have to mop up the broken jobs.
Runner installed using GitLab Kubernetes integration, although I've now updating it to add a cache config and use newer runner.
Used GitLab Runner version
gitlab/gitlab-runner:alpine-v10.6.0
Edited by Fred Cox