Check-Live script causes runner pods to be restarted

In version 0-57-stable the Check-Live script was only checking if the Runner service is running to determine if the runner was alive. In version 0-58-stable this Check-Live script was updated to also check if the Runner was register/online on the GitLab Server as well.

This has introduced a issue whereby when the GitLab server is offline for longer than 30 seconds the pods are restarted. After the pod comes back online the runner will attempt register itself with the Gitlab server for a total of 30 attempts before it gives up.

This will mean that customer that maintain their instance take the instance down for backups (longer than the time it takes for the above process to complete) need to manually restart runner pods after the instance has come online to recover the runners.

Would be great to have a means to configure this or disable this behaviour.

Customers have worked around the issue by configuring the liveliness probe setting in the deployment.yml file. Also related to this open feature request to expose these settings in the values file.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information