Pods keep restarting with a Liveness probe failed error after upgrade to Runner v16.5.0

Summary

I have installed GitLab Runner on a K3s cluster by official Helm Chart connected to a local GitLab EE instance. After upgraded to latest v16.5.0 from v16.4.2, the pods keep restarting with a Liveness probe failed event.

Steps to reproduce

Deploy the v16.5.0 GitLab Runner by official Helm Chart, and wait for the pod restart.

My values.yaml

## GitLab Runner Image
##
## By default it's using registry.gitlab.com/gitlab-org/gitlab-runner:alpine-v{VERSION}
## where {VERSION} is taken from Chart.yaml from appVersion field
##
## ref: https://gitlab.com/gitlab-org/gitlab-runner/container_registry/29383?orderBy=NAME&sort=asc&search[]=alpine-v&search[]=
##
## Note: If you change the image to the ubuntu release
##       don't forget to change the securityContext;
##       these images run on different user IDs.
##
image:
  registry: registry.gitlab.com
  image: gitlab-org/gitlab-runner
  ##tag: alpine-v16.0.2

## When using GitLab Runner Helm Chart with gitlab-runner-ubi-images (https://gitlab.com/gitlab-org/ci-cd/gitlab-runner-ubi-images/container_registry)
## the installation fails because dumb-init is not packaged in the image. However, the tini is present.
## This configuration will allow gitlab-runner-ubi-images users to explicitly enabled the use of `tini` instead of `dumb-init`
useTini: false

## Specify a imagePullPolicy for the main runner deployment
## 'Always' if imageTag is 'latest', else set to 'IfNotPresent'
##
## Note: it does not apply to job containers launched by this executor.
## Use `pull_policy` in [runners.kubernetes] to change it.
##
## ref: https://kubernetes.io/docs/concepts/containers/images/#pre-pulled-images
##
imagePullPolicy: IfNotPresent

## Specifying ImagePullSecrets on a Pod
## Kubernetes supports specifying container image registry keys on a Pod.
## ref: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod
##
# imagePullSecrets:
#   - name: "image-pull-secret"

## Timeout, in seconds, for liveness and readiness probes of a runner pod.
probeTimeoutSeconds: 5

## How many runner pods to launch.
##
## Note: Using more than one replica is not supported with a runnerToken. Use a runnerRegistrationToken
## to create multiple runner replicas.
replicas: 1

## How many old ReplicaSets for this Deployment you want to retain
# revisionHistoryLimit: 10

## The GitLab Server URL (with protocol) that want to register the runner against
## ref: https://docs.gitlab.com/runner/commands/index.html#gitlab-runner-register
##
gitlabUrl: https://gitlab.example.com
## The Registration Token for adding new Runners to the GitLab Server. This must
## be retrieved from your GitLab Instance.
## ref: https://docs.gitlab.com/ce/ci/runners/index.html
## ref: https://docs.gitlab.com/runner/register/
##
runnerRegistrationToken: "xxxxxxxx"

## The Runner Token for adding new Runners to the GitLab Server. This must
## be retrieved from your GitLab Instance. It is token of already registered runner.
## ref: (we don't yet have docs for that, but we want to use existing token)
##
# runnerToken: ""
#

## Unregister all runners before termination
##
## Updating the runner's chart version or configuration will cause the runner container
## to be terminated and created again. This may cause your Gitlab instance to reference
## non-existant runners. Un-registering the runner before termination mitigates this issue.
## ref: https://docs.gitlab.com/runner/commands/index.html#gitlab-runner-unregister
##
# unregisterRunners: true

## When stopping the runner, give it time to wait for its jobs to terminate.
##
## Updating the runner's chart version or configuration will cause the runner container
## to be terminated with a graceful stop request. terminationGracePeriodSeconds
## instructs Kubernetes to wait long enough for the runner pod to terminate gracefully.
## ref: https://docs.gitlab.com/runner/commands/#signals
terminationGracePeriodSeconds: 3600

## Set the certsSecretName in order to pass custom certficates for GitLab Runner to use
## Provide resource name for a Kubernetes Secret Object in the same namespace,
## this is used to populate the /home/gitlab-runner/.gitlab-runner/certs/ directory
## ref: https://docs.gitlab.com/runner/configuration/tls-self-signed.html#supported-options-for-self-signed-certificates-targeting-the-gitlab-server
##
# certsSecretName:

## Configure the maximum number of concurrent jobs
## ref: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section
##
concurrent: 10

## Defines in seconds how often to check GitLab for a new builds
## ref: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section
##
checkInterval: 30

## Configure GitLab Runner's logging level. Available values are: debug, info, warn, error, fatal, panic
## ref: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section
##
# logLevel:

## Configure GitLab Runner's logging format. Available values are: runner, text, json
## ref: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section
##
# logFormat:

## Configure GitLab Runner's Sentry DSN.
## ref https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-global-section
##
# sentryDsn:

## A custom bash script that will be executed prior to the invocation
## gitlab-runner process
#
#preEntrypointScript: |
#  echo "hello"

## Specify whether the runner should start the session server.
## Defaults to false
## ref:
##
## When sessionServer is enabled, the user can either provide a public publicIP
## or rely on the external IP auto discovery
## When a serviceAccountName is used with the automounting to the pod disable,
## we recommend the usage of the publicIP
sessionServer:
  enabled: false
  # annotations: {}
  # timeout: 1800
  # internalPort: 8093
  # externalPort: 9000
  # publicIP: ""
  # loadBalancerSourceRanges:
  #   - 1.2.3.4/32

## For RBAC support:https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments
rbac:
  create: true

  ## Define list of rules to be added to the rbac role permissions.
  ## Each rule supports the keys:
  ## - apiGroups: default "" (indicates the core API group) if missing or empty.
  ## - resources: default "*" if missing or empty.
  ## - verbs: default "*" if missing or empty.
  ##
  ## Read more about the recommended rules on the following link
  ##
  ## ref: https://docs.gitlab.com/runner/executors/kubernetes.html#configuring-executor-service-account
  ##
  rules:
   - resources: ["configmaps", "pods", "pods/attach", "secrets", "services"]
     verbs: ["get", "list", "watch", "create", "patch", "update", "delete"]
   - apiGroups: [""]
     resources: ["pods/exec"]
     verbs: ["create", "patch", "delete"]

  ## Run the gitlab-bastion container with the ability to deploy/manage containers of jobs
  ## cluster-wide or only within namespace
  clusterWideAccess: false

  ## Use the following Kubernetes Service Account name if RBAC is disabled in this Helm chart (see rbac.create)
  ##
  # serviceAccountName: default

  ## Specify annotations for Service Accounts, useful for annotations such as eks.amazonaws.com/role-arn
  ##
  ## ref: https://docs.aws.amazon.com/eks/latest/userguide/specify-service-account-role.html
  ##
  # serviceAccountAnnotations: {}

  ## Use podSecurity Policy
  ## ref: https://kubernetes.io/docs/concepts/policy/pod-security-policy/
  podSecurityPolicy:
    enabled: false
    resourceNames:
    - gitlab-runner

  ## Specify one or more imagePullSecrets used for pulling the runner image
  ##
  ## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#add-imagepullsecrets-to-a-service-account
  ##
  # imagePullSecrets: []

## Configure integrated Prometheus metrics exporter
##
## ref: https://docs.gitlab.com/runner/monitoring/#configuration-of-the-metrics-http-server
##
metrics:
  enabled: false

  ## Define a name for the metrics port
  ##
  portName: metrics

  ## Provide a port number for the integrated Prometheus metrics exporter
  ##
  port: 9252

  ## Configure a prometheus-operator serviceMonitor to allow autodetection of
  ## the scraping target. Requires enabling the service resource below.
  ##
  serviceMonitor:
    enabled: false

    ## Provide additional labels to the service monitor ressource
    ##
    ## labels: {}

    ## Define a scrape interval (otherwise prometheus default is used)
    ##
    ## ref: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config
    ##
    # interval: ""

    ## Specify the scrape protocol scheme e.g., https or http
    ##
    # scheme: "http"

    ## Supply a tls configuration for the service monitor
    ##
    ## ref: https://github.com/helm/charts/blob/master/stable/prometheus-operator/crds/crd-servicemonitor.yaml
    ##
    # tlsConfig: {}

    ## The URI path where prometheus metrics can be scraped from
    ##
    # path: "/metrics"

    ## A list of MetricRelabelConfigs to apply to samples before ingestion
    ##
    ## ref: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#metric_relabel_configs
    ##
    # metricRelabelings: []

    ## A list of RelabelConfigs to apply to samples before scraping
    ##
    ## ref: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
    ##
    ## relabelings: []

## Configure a service resource e.g., to allow scraping metrics via
## prometheus-operator serviceMonitor
service:
  enabled: false

  ## Provide additonal labels for the service
  ##
  # labels: {}

  ## Provide additonal annotations for the service
  ##
  # annotations: {}

  ## Define a specific ClusterIP if you do not want a dynamic one
  ##
  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#choosing-your-own-ip-address
  ##
  # clusterIP: ""

  ## Define a list of one or more external IPs for this service
  ##
  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#external-ips
  ##
  # externalIPs: []

  ## Provide a specific loadbalancerIP e.g., of an external Loadbalancer
  ##
  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer
  ##
  # loadBalancerIP: ""

  ## Provide a list of source IP ranges to have access to this service
  ##
  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#aws-nlb-support
  ##
  # loadBalancerSourceRanges: []

  ## Specify the service type e.g., ClusterIP, NodePort, Loadbalancer or ExternalName
  ##
  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#publishing-services-service-types
  ##
  type: ClusterIP

  ## Specify the services metrics nodeport if you use a service of type nodePort
  ##
  # metrics:

    ## Specify the node port under which the prometheus metrics of the runner are made
    ## available.
    ##
    ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#nodeport
    ##
    # nodePort: ""

  ## Provide a list of additional ports to be exposed by this service
  ##
  ## ref: https://kubernetes.io/docs/concepts/services-networking/service/#defining-a-service
  ##
  # additionalPorts: []

## Configuration for the Pods that the runner launches for each new job
##
runners:
  # runner configuration, where the multi line strings is evaluated as
  # template so you can specify helm values inside of it.
  #
  # tpl: https://helm.sh/docs/howto/charts_tips_and_tricks/#using-the-tpl-function
  # runner configuration: https://docs.gitlab.com/runner/configuration/advanced-configuration.html
  config: |
    [[runners]]
      [runners.kubernetes]
        namespace = "{{.Release.Namespace}}"
        image = "ubuntu:22.04"
        pull_policy = "if-not-present"
        [[runners.kubernetes.volumes.host_path]]
          name = "docker"
          mount_path = "/var/run/docker.sock"
          read_only = true
          host_path = "/var/run/docker.sock"


  ## Which executor should be used
  ##
  executor: kubernetes

  ## Specify whether the runner should be locked to a specific project: true, false. Defaults to true.
  ##
  # locked: true

  ## Specify the tags associated with the runner. Comma-separated list of tags.
  ##
  ## ref: https://docs.gitlab.com/ee/ci/runners/configure_runners.html#use-tags-to-control-which-jobs-a-runner-can-run
  ##
  # tags: ""

  ## Specify the name for the runner.
  ##
  # name: ""

  ## Specify the maximum timeout (in seconds) that will be set for job when using this Runner
  ##
  # maximumTimeout: ""

  ## Specify if jobs without tags should be run.
  ## If not specified, Runner will default to true if no tags were specified. In other case it will
  ## default to false.
  ##
  ## ref: https://docs.gitlab.com/ee/ci/runners/configure_runners.html#set-a-runner-to-run-untagged-jobs
  ##
  # runUntagged: true

  ## Specify whether the runner should only run protected branches.
  ## Defaults to false.
  ##
  ## ref: https://docs.gitlab.com/ee/ci/runners/configure_runners.html#prevent-runners-from-revealing-sensitive-information
  ##
  # protected: true

  ## The name of the secret containing runner-token and runner-registration-token
  # secret: gitlab-runner

  ## Distributed runners caching
  ## ref: https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching
  ##
  ## If you want to use s3 based distributing caching:
  ## First of all you need to uncomment General settings and S3 settings sections.
  ##
  ## Create a secret 's3access' containing 'accesskey' & 'secretkey'
  ## ref: https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/
  ##
  ## $ kubectl create secret generic s3access \
  ##   --from-literal=accesskey="YourAccessKey" \
  ##   --from-literal=secretkey="YourSecretKey"
  ## ref: https://kubernetes.io/docs/concepts/configuration/secret/
  ##
  ## If you want to use gcs based distributing caching:
  ## First of all you need to uncomment General settings and GCS settings sections.
  ##
  ## Access using credentials file:
  ## Create a secret 'google-application-credentials' containing your application credentials file.
  ## ref: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runnerscachegcs-section
  ## You could configure
  ## $ kubectl create secret generic google-application-credentials \
  ##   --from-file=gcs-application-credentials-file=./path-to-your-google-application-credentials-file.json
  ## ref: https://kubernetes.io/docs/concepts/configuration/secret/
  ##
  ## Access using access-id and private-key:
  ## Create a secret 'gcsaccess' containing 'gcs-access-id' & 'gcs-private-key'.
  ## ref: https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runnerscachegcs-section
  ## You could configure
  ## $ kubectl create secret generic gcsaccess \
  ##   --from-literal=gcs-access-id="YourAccessID" \
  ##   --from-literal=gcs-private-key="YourPrivateKey"
  ## ref: https://kubernetes.io/docs/concepts/configuration/secret/
  ##
  ## If you want to use Azure-based distributed caching:
  ## First, uncomment General settings.
  ##
  ## Create a secret 'azureaccess' containing 'azure-account-name' & 'azure-account-key'
  ## ref: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction
  ##
  ## $ kubectl create secret generic azureaccess \
  ##   --from-literal=azure-account-name="YourAccountName" \
  ##   --from-literal=azure-account-key="YourAccountKey"
  ## ref: https://kubernetes.io/docs/concepts/configuration/secret/

  cache: {}
    ## S3 the name of the secret.
    # secretName: s3access
    ## Use this line for access using gcs-access-id and gcs-private-key
    # secretName: gcsaccess
    ## Use this line for access using google-application-credentials file
    # secretName: google-application-credentials
    ## Use this line for access using Azure with azure-account-name and azure-account-key
    # secretName: azureaccess

## Specify the name of the scheduler which used to schedule runner pods.
## Kubernetes supports multiple scheduler configurations.
## ref: https://kubernetes.io/docs/reference/scheduling
# schedulerName: "my-custom-scheduler"

## Configure securitycontext for the main container
## ref: http://kubernetes.io/docs/user-guide/security-context/
##
securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: false
  runAsNonRoot: true
  privileged: false
  capabilities:
    drop: ["ALL"]

## Configure securitycontext valid for the whole pod
## ref: http://kubernetes.io/docs/user-guide/security-context/
##
podSecurityContext:
  runAsUser: 100
  # runAsGroup: 65533
  fsGroup: 65533
  # supplementalGroups: [65533]

  ## Note: values for the ubuntu image:
  # runAsUser: 999
  # fsGroup: 999

## Configure resource requests and limits
## ref: http://kubernetes.io/docs/user-guide/compute-resources/
##
resources: {}
  # limits:
  #   memory: 256Mi
  #   cpu: 200m
  # requests:
  #   memory: 128Mi
  #   cpu: 100m

## Affinity for pod assignment
## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
##
affinity: {}

## Node labels for pod assignment
## Ref: https://kubernetes.io/docs/user-guide/node-selection/
##
nodeSelector: {}
  # Example: The gitlab runner manager should not run on spot instances so you can assign
  # them to the regular worker nodes only.
  # node-role.kubernetes.io/worker: "true"

## List of node taints to tolerate (requires Kubernetes >= 1.6)
## Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
##
tolerations: []
  # Example: Regular worker nodes may have a taint, thus you need to tolerate the taint
  # when you assign the gitlab runner manager with nodeSelector or affinity to the nodes.
  # - key: "node-role.kubernetes.io/worker"
  #   operator: "Exists"

## Configure environment variables that will be present when the registration command runs
## This provides further control over the registration process and the config.toml file
## ref: `gitlab-runner register --help`
## ref: https://docs.gitlab.com/runner/configuration/advanced-configuration.html
##
# envVars:
#   - name: RUNNER_EXECUTOR
#     value: kubernetes

## list of hosts and IPs that will be injected into the pod's hosts file
hostAliases: []
  # Example:
  # - ip: "127.0.0.1"
  #   hostnames:
  #   - "foo.local"
  #   - "bar.local"
  # - ip: "10.1.2.3"
  #   hostnames:
  #   - "foo.remote"
  #   - "bar.remote"

## Annotations to be added to manager pod
##
podAnnotations: {}
  # Example:
  # iam.amazonaws.com/role: <my_role_arn>

## Labels to be added to manager pod
##
podLabels: {}
  # Example:
  # owner.team: <my_cool_team>

## HPA support for custom metrics:
## This section enables runners to autoscale based on defined custom metrics.
## In order to use this functionality, Need to enable a custom metrics API server by
## implementing "custom.metrics.k8s.io" using supported third party adapter
## Example: https://github.com/directxman12/k8s-prometheus-adapter
##
#hpa: {}
  # minReplicas: 1
  # maxReplicas: 10
  # metrics:
  # - type: Pods
  #   pods:
  #     metricName: gitlab_runner_jobs
  #     targetAverageValue: 400m

## Configure priorityClassName for manager pod. See k8s docs for more info on how pod priority works:
##  https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
priorityClassName: ""

## Secrets to be additionally mounted to the containers.
## All secrets are mounted through init-runner-secrets volume
## and placed as readonly at /init-secrets in the init container
## and finally copied to an in-memory volume runner-secrets that is
## mounted at /secrets.
secrets: []
  # Example:
  # - name: my-secret
  # - name: myOtherSecret
  #   items:
  #     - key: key_one
  #       path: path_one

## Additional config files to mount in the containers in `/configmaps`.
##
## Please note that a number of keys are reserved by the runner.
## See https://gitlab.com/gitlab-org/charts/gitlab-runner/-/blob/main/templates/configmap.yaml
## for a current list.
configMaps: {}

## Additional volumeMounts to add to the runner container
##
volumeMounts: []
  # Example:
  # - name: my-volume
  #   mountPath: /mount/path

## Additional volumes to add to the runner deployment
##
volumes: []
  # Example:
  # - name: my-volume
  #   persistentVolumeClaim:
  #     claimName: my-pvc

Actual behavior

Pod reaches running state for few minutes and keeps restarting with a Liveness probe failed event

Expected behavior

GitLab Runner pods run normally without restart

Relevant logs and/or screenshots

kubectl logs output

Environment description

Custom installed runner hosted on K3s cluster.

config.toml contents

concurrent = 10
check_interval = 30
log_level = "info"
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "gitlab-runner-5876998784-jgt2b"
  url = "https://gitlab.example.com"
  id = 2
  token = ""
  token_obtained_at = 2023-10-23T05:38:50Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "kubernetes"
  [runners.cache]
    MaxUploadedArchiveSize = 0
  [runners.kubernetes]
    host = ""
    bearer_token_overwrite_allowed = false
    image = "ubuntu:22.04"
    namespace = "gitlab-runner"
    namespace_overwrite_allowed = ""
    pull_policy = ["if-not-present"]
    node_selector_overwrite_allowed = ""
    pod_labels_overwrite_allowed = ""
    service_account_overwrite_allowed = ""
    pod_annotations_overwrite_allowed = ""
    [runners.kubernetes.pod_security_context]
    [runners.kubernetes.init_permissions_container_security_context]
    [runners.kubernetes.build_container_security_context]
    [runners.kubernetes.helper_container_security_context]
    [runners.kubernetes.service_container_security_context]
    [runners.kubernetes.volumes]

      [[runners.kubernetes.volumes.host_path]]
        name = "docker"
        mount_path = "/var/run/docker.sock"
        read_only = true
        host_path = "/var/run/docker.sock"
    [runners.kubernetes.dns_config]

Used GitLab Runner version

v16.5.0

Possible fixes

If I downgrade back to v16.4.2 using same values.yaml, the pods run normally without restart.

Edited Oct 23, 2023 by Tony Chan