System failure with FF_PRINT_POD_EVENTS
After upgrading to v16.5.0 I wanted to test the new FF_PRINT_POD_EVENTS feature
but it doesn't seem to work for me:
Running with gitlab-runner 16.5.0 (853330f9)
on DevOps Kubernetes WestUS2 Azure - AMD64 78GLy9cD6, system ID: r_JA1KverkOlMn
feature flags: FF_RETRIEVE_POD_WARNING_EVENTS:true, FF_PRINT_POD_EVENTS:true
Resolving secrets 00:00
Preparing the "kubernetes" executor 00:00
Using Kubernetes namespace: default
Using Kubernetes executor with image debian:stable-slim ...
Using attach strategy to execute scripts...
Preparing environment 00:01
Using FF_USE_POD_ACTIVE_DEADLINE_SECONDS, the Pod activeDeadlineSeconds will be set to the job timeout: 1h0m0s...
Subscribing to Kubernetes Pod events...
ERROR: Job failed (system failure): prepare environment: unknown (get events). Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
Some noted errors in events
3m44s Normal Scheduled Pod/runner-78gly9cd6-project-33599309-concurrent-0-ugrjqtet Successfully assigned indigo-k8s-amd64-test-runner/runner-78gly9cd6-project-33599309-concurrent-0-ugrjqtet to aks-amd64runner-37812845-vmss000a3k
2m40s (x8 over 3m44s) Warning FailedMount Pod/runner-78gly9cd6-project-33599309-concurrent-0-ugrjqtet MountVolume.SetUp failed for volume "kube-api-access-2ngj2" : failed to fetch token: pod "runner-78gly9cd6-project-33599309-concurrent-0-ugrjqtet" not found
101s Warning FailedMount Pod/runner-78gly9cd6-project-33599309-concurrent-0-ugrjqtet Unable to attach or mount volumes: unmounted volumes=[kube-api-access-2ngj2], unattached volumes=[scripts logs docker-socket repo kube-api-access-2ngj2]: timed out waiting for the condition
values.yaml
# Defaults from https://gitlab.com/gitlab-org/charts/gitlab-runner/blob/main/values.yaml
image:
registry: registry.gitlab.com
image: gitlab-org/gitlab-runner
tag: ubuntu-v16.5.0
imagePullPolicy: Always
gitlabUrl: https://gitlab.com/
checkInterval: 3
concurrent: 360
unregisterRunners: true
terminationGracePeriodSeconds: 3600
metrics:
enabled: true
portName: metrics
port: 9252
service:
enabled: true
nodeSelector:
kubernetes.azure.com/mode: "system"
kubernetes.io/arch: "amd64"
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
rbac:
create: true
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["list", "get", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get"]
- apiGroups: [""]
resources: ["pods/attach"]
verbs: ["list", "get", "create", "delete", "update"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["list", "get", "create", "delete", "update"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["list", "get", "create", "delete", "update"]
- apiGroups: [""]
resources: ["events"]
verbs: ["list"]
podSecurityContext:
runAsUser: 999
fsGroup: 999
# Workaround cache errors https://gitlab.com/gitlab-org/gitlab-runner/-/issues/3802
preEntrypointScript: |
sed -i '/\[runners.cache.gcs\]/d' /home/gitlab-runner/.gitlab-runner/config.toml
sed -i '/\[runners.cache.azure\]/d' /home/gitlab-runner/.gitlab-runner/config.toml
runners:
cache:
secretName: s3access
secret: gitlab-token
tags: "small-amd64-k8s-uswest2-azure,indigo-k8s-small-amd64"
name: "DevOps Kubernetes WestUS2 Azure - AMD64"
config: |
[[runners]]
pre_build_script = '''
# If docker CLI exists wait for dockerd to start
# Docker is accessed via unix socket on k8s runners
unset DOCKER_HOST
unset DOCKER_CERT_PATH
unset DOCKER_TLS_VERIFY
if command -v docker &> /dev/null; then
i=1; while [ $i -le 10 ]; do
echo "docker command found, waiting for dockerd service $i/10..."
docker version &> /dev/null && break
sleep 1
if [ $i -eq 10 ]; then
echo "WARNING docker cli detected but dockerd service not found, continuing build..."
fi
i=$(( i + 1 ))
done
fi
'''
[runners.feature_flags]
# Retrieve Pod warnings on job failure
FF_PRINT_POD_EVENTS = true
FF_RETRIEVE_POD_WARNING_EVENTS = true
[runners.cache]
Type = "s3"
Path = ""
Shared = false
[runners.cache.s3]
ServerAddress = "detoolsminio.minio:9000"
BucketName = "gitlab-cache"
Insecure = true
BucketLocation = "none"
[runners.kubernetes]
namespace = "{{.Release.Namespace}}"
poll_timeout = 1800
# Default image if non-specified in .gitlab-ci.yml
image = "debian:stable-slim"
helper_image = "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-v${CI_RUNNER_VERSION}"
# Required for docker to create containers
privileged = true
allow_privilege_escalation = true
pull_policy = "if-not-present"
# Requests fit 2 jobs/node without exceeding 95% node utilization
cpu_request = "1700m"
cpu_limit = "1700m"
memory_request = "5700Mi"
memory_limit = "5700Mi"
# Docker-in-Docker 1 job/node
service_cpu_request = "1700m"
service_cpu_limit = "3400m"
service_memory_request = "5700Mi"
service_memory_limit = "11400Mi"
[runners.kubernetes.node_tolerations]
"gitlab-runner=true" = "NoSchedule"
[runners.kubernetes.node_selector]
"kubernetes.azure.com/agentpool" = "amd64runner"
"kubernetes.io/arch" = "amd64"
[[runners.kubernetes.host_aliases]]
# Workaround scale up DNS not resolving race condition
ip = "172.65.251.78"
hostnames = ["gitlab.com"]
[[runners.kubernetes.volumes.empty_dir]]
# Makes /var/run/docker.sock available to all containers in pod
name = "docker-socket"
mount_path = "/var/run/"
path = "/var/run/"
medium = "Memory"