Always attempt to retrieve pod warning events
What does this MR do?
!4211 (merged) was added to diagnose job failures due to Kubernetes killing a pod, but it is disabled by default. This would have helped debug this request by e.g. In this MR, GitLab Runner will now attempt to retrieve the warning events. Failure to retrieve them will only print a debug log (to mention the inability to retrieve the Pod warning events) in the GitLab Runner log only when FF_RETRIEVE_POD_WARNING_EVENTS
is enabled.
We are deprecating the FF in %17.2 and will remove it altogether in %18.0
Why was this MR needed?
To ease diagnostic when using the executorkubernetes
What's the best way to test this MR?
breaking change
Test Showing it is not avalues.yaml with MR image (event permission missing)
image:
registry: registry.gitlab.com
image: gitlab-org/gitlab-runner/gitlab-runner-dev@sha256
tag: e53bcc658a0c7cfbcc7d98461e631a309f5614b8485ea10350a6a4767d0b23d1
useTini: false
imagePullPolicy: IfNotPresent
# replicas: 1
gitlabUrl: https://gitlab.com/
runnerToken: "glrt-REDACTED"
# unregisterRunners: true
## Configure the livenessProbe
livenessProbe:
initialDelaySeconds: 30
# periodSeconds: 10
# successThreshold: 1
failureThreshold: 5
## Configure the readinessProbe
readinessProbe:
# initialDelaySeconds: 60
periodSeconds: 25
successThreshold: 3
# failureThreshold: 3
useJobNamespace: true
terminationGracePeriodSeconds: 0
concurrent: 1
checkInterval: 1
logLevel: "debug"
sessionServer:
enabled: false
# publicIP: ""
annotations: {}
timeout: 1800
internalPort: 8093
externalPort: 9000
# serviceType: LoadBalancer
## For RBAC support:
rbac:
create: true
rules:
- apiGroups: [""]
resources: ["configmaps", "pods", "pods/attach", "pods/log", "secrets", "services", "serviceAccounts"]
verbs: ["get", "list", "watch", "create", "patch", "update", "delete"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create", "patch", "delete"]
clusterWideAccess: false
serviceAccountAnnotations:
tests: ratchade-rbac
tests-rbac: ratchade-rbac
podSecurityPolicy:
enabled: false
resourceNames:
- gitlab-runner
metrics:
enabled: true
portName: metrics
port: 9252
serviceMonitor:
enabled: false
service:
enabled: false
type: ClusterIP
runners:
config: |
[[runners]]
[runners.kubernetes]
image = "alpine:invalid"
helper_image = "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:x86_64-113187dc"
runUntagged: true
protected: true
tags: "tests, ra-tests"
builds: {}
services: {}
helpers: {}
envVars: {}
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
runAsNonRoot: true
privileged: false
capabilities:
drop: ["ALL"]
podSecurityContext:
runAsUser: 100
runAsGroup: 65533
fsGroup: 65533
resources:
requests:
memory: 10Mi
cpu: 100m
affinity: {}
nodeSelector: {}
tolerations: []
hostAliases: []
podAnnotations: {}
podLabels: {}
hpa: {}
secrets: []
configMaps: {}
volumeMounts: []
volumes: []
Extract
Running with gitlab-runner 17.2.0~pre.97.g113187dc (113187dc)
on gitlab-runner-7d67cc7d7c-cw486 xxx, system ID: xxx
feature flags: FF_USE_POWERSHELL_PATH_RESOLVER:true, FF_SCRIPT_SECTIONS:true
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: default
Using Kubernetes executor with image alpine:invalid ...
Using attach strategy to execute scripts...
Preparing environment
00:06
Using FF_USE_POD_ACTIVE_DEADLINE_SECONDS, the Pod activeDeadlineSeconds will be set to the job timeout: 30m0s...
Waiting for pod default/runner-q6ecjtkg-project-25452826-concurrent-0-u1bwhq7q to be running, status is Pending
Waiting for pod default/runner-q6ecjtkg-project-25452826-concurrent-0-u1bwhq7q to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
WARNING: Failed to pull image with policy "": image pull failed: Back-off pulling image "alpine:invalid"
ERROR: Job failed: prepare environment: waiting for pod running: pulling image "alpine:invalid": image pull failed: Back-off pulling image "alpine:invalid". Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
Failure to retrieve the events log in GitLab Runner Logs
Error retrieving events list: events is forbidden: User "system:serviceaccount:default:gitlab-runner" cannot list resource "events" in API group "" in the namespace "default" job=7301034076 project=25452826 runner=xxx
The warning events are not retrieved and it doesn't impact the outcome of the job.
What are the relevant issue numbers?
close #37834 (closed)
Edited by Romuald Atchadé