Skip to content

Kubernetes executor lost track of running pods after restart

Summary

We use the kubernetes executor with the Helm chart to run job in kubernetes. The kubernetes executor container was restated(OOMKilled) and used the registration token to register a new runner. Now there are two runners registered in gitlab for the same kubernetes executor. Also the kubernetes executor lost track of all running jobs, in gitlab the jobs are still marked as running and do not timeout. In the kubernetes cluster the pods of the jobs still exists and are "running" (the builds are finished and CPU usage is 0). The kubernetes executor does not cleanup the old pods. CI Jobs started after the executor container was restarted are processed normally.

Steps to reproduce

Set a low resource memory limit (50Mi). Run the kubernetes executor and start many CI jobs. The executor container is killed and restarted. All jobs that were running are now be stuck and have to be cleaned up manually.

Actual behavior

Kubernetes executor registers new runner, even though it is already registered. After restart the executor does not recognize the existing pods.

Expected behavior

Don't register new runner. Also cleanup all created pods, even when the executor process is restarted.

Environment description

The gitlab runner is configured with the helm chart (version 0.23.0) and registered with the runnerRegistrationToken value in helm.

Used GitLab Runner version

gitlab-runner helm chart 0.23.0