Hung Kubernetes Runner
Problem Summary
We are using Gitlab SAS with Runners on K8s pods. Most of the time runner pods get terminated itself after the build, however sometimes we have noticed builds gets hung and pods stays for days. Also we are unable to check the build logs to determine reason for the issue.
How to reproduce:
- Register K8s runner with Gitlab (in our case master runner is gitlab-runner-gitlab-runner-6b87cb7f87-zr2mw)
- Start a build on K8s runner (in our case runner-lyh1ijm-project-9966627-concurrent-0r4lc2)
- Perform helm update
- Notice that master runner pod (the one which is registered with Gitlab) gets terminated and recreated.
- Pod list shows concurrent runner in which build was in progress did not get terminated and stays back. [root@stg-cpt-jnk-slv-w2a-a gitlab]# kubectl get pods NAME READY STATUS RESTARTS AGE gitlab-runner-gitlab-runner-6b87cb7f87-zr2mw 1/1 Running 0 27h runner-lyh1ijm-project-9966627-concurrent-0r4lc2 2/2 Running 0 27h tiller-deploy-8cd966cd7-p992f 1/1 Running 0 17h
- Unable to get logs on the runner, its showing blank. [root@stg-cpt-jnk-slv-w2a-a gitlab]# kubectl logs runner-lyh1ijm-project-9966627-concurrent-0r4lc2 build [root@stg-cpt-jnk-slv-w2a-a gitlab]#
- Gitlab UI shows job remain stuck
Suspected reason:
During helm update main runner (in our case gitlab-runner-gitlab-runner-6b87cb7f87-zr2mw) get restarted. And concurrent runner stays back which is unable to communicate build status back to Gitlab in the absence of original master runner, which got restarted during helm update.
Looking for:
- Reason for the hung/stuck status of build
- Reason for the Runner pod not getting terminated
- How to get build logs either from gitlab ui or pod logs