An error occurred while fetching the assigned iteration of the selected issue.
Kubernetes executor loses pods when restarting
Summary
When using Kubernetes executor, whenever the runner is restart, jobs will be left lying around.
- GitLab things the jobs are still running and will do so until you cancel the job (even the timeout doesnt work)
- This causes all your pipelines to stall
- The pods are still running in the cluster
- The pods dont seem to ever quit, as the job isnt running as the main process in the pod
Steps to reproduce
- Runner kubernetes runner
- Start some jobs
- Restart the runner (i.e. by deleting its pod or updating the deployment yaml)
Actual behavior
- Pods that your jobs are running in continue to run indefinately
- GitLab things the jobs are still running
Expected behavior
- Pods cleaned up
- Jobs failed
Or even better
- Runner reconnects to running jobs
Environment description
Self hosted GitLab (omnibus) and runner in GKE cluster.
the cluster uses Pre-emptible nodes so the runner is restarted at least every 24h. So far i've been changing the config quite a bit, and everytime i do so i have to mop up the broken jobs.
Runner installed using GitLab Kubernetes integration, although I've now updating it to add a cache config and use newer runner.
Used GitLab Runner version
gitlab/gitlab-runner:alpine-v10.6.0