Skip to content

Always attempt to retrieve pod warning events

Romuald Atchadé requested to merge k8s-warning-retrieval-by-default into main

What does this MR do?

!4211 (merged) was added to diagnose job failures due to Kubernetes killing a pod, but it is disabled by default. This would have helped debug this request by e.g. In this MR, GitLab Runner will now attempt to retrieve the warning events. Failure to retrieve them will only print a debug log (to mention the inability to retrieve the Pod warning events) in the GitLab Runner log only when FF_RETRIEVE_POD_WARNING_EVENTS is enabled.

We are deprecating the FF in %17.2 and will remove it altogether in %18.0

Why was this MR needed?

To ease diagnostic when using the executorkubernetes

What's the best way to test this MR?

Test Showing it is not a breaking change

values.yaml with MR image (event permission missing)
image:
  registry: registry.gitlab.com
  image: gitlab-org/gitlab-runner/gitlab-runner-dev@sha256
  tag: e53bcc658a0c7cfbcc7d98461e631a309f5614b8485ea10350a6a4767d0b23d1
useTini: false
imagePullPolicy: IfNotPresent
# replicas: 1
gitlabUrl: https://gitlab.com/
runnerToken: "glrt-REDACTED"
# unregisterRunners: true

## Configure the livenessProbe
livenessProbe:
  initialDelaySeconds: 30
#   periodSeconds: 10
#   successThreshold: 1
  failureThreshold: 5

## Configure the readinessProbe
readinessProbe:
#   initialDelaySeconds: 60
  periodSeconds: 25
  successThreshold: 3
#   failureThreshold: 3

useJobNamespace: true
terminationGracePeriodSeconds: 0
concurrent: 1
checkInterval: 1
logLevel: "debug"
sessionServer:
  enabled: false
  # publicIP: ""
  annotations: {}
  timeout: 1800
  internalPort: 8093
  externalPort: 9000
  # serviceType: LoadBalancer
## For RBAC support:
rbac:
  create: true
  rules:
    - apiGroups: [""]
      resources: ["configmaps", "pods", "pods/attach", "pods/log", "secrets", "services",  "serviceAccounts"]
      verbs: ["get", "list", "watch", "create", "patch", "update", "delete"]
    - apiGroups: [""]
      resources: ["pods/exec"]
      verbs: ["create", "patch", "delete"]
  clusterWideAccess: false
  serviceAccountAnnotations:
    tests: ratchade-rbac
    tests-rbac: ratchade-rbac
  podSecurityPolicy:
    enabled: false
    resourceNames:
    - gitlab-runner
metrics:
  enabled: true
  portName: metrics
  port: 9252
  serviceMonitor:
    enabled: false
service:
  enabled: false
  type: ClusterIP
runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        image = "alpine:invalid"
        helper_image = "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper-dev:x86_64-113187dc"
  runUntagged: true
  protected: true
  tags: "tests, ra-tests"
  builds: {}
  services: {}
  helpers: {}
envVars: {}
securityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: false
  runAsNonRoot: true
  privileged: false
  capabilities:
    drop: ["ALL"]
podSecurityContext:
  runAsUser: 100
  runAsGroup: 65533
  fsGroup: 65533
resources:
  requests:
    memory: 10Mi
    cpu: 100m
affinity: {}
nodeSelector: {}
tolerations: []
hostAliases: []
podAnnotations: {}
podLabels: {}
hpa: {}
secrets: []
configMaps: {}
volumeMounts: []
volumes: []

Job

Extract
Running with gitlab-runner 17.2.0~pre.97.g113187dc (113187dc)
  on gitlab-runner-7d67cc7d7c-cw486 xxx, system ID: xxx
  feature flags: FF_USE_POWERSHELL_PATH_RESOLVER:true, FF_SCRIPT_SECTIONS:true
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: default
Using Kubernetes executor with image alpine:invalid ...
Using attach strategy to execute scripts...
Preparing environment
00:06
Using FF_USE_POD_ACTIVE_DEADLINE_SECONDS, the Pod activeDeadlineSeconds will be set to the job timeout: 30m0s...
Waiting for pod default/runner-q6ecjtkg-project-25452826-concurrent-0-u1bwhq7q to be running, status is Pending
Waiting for pod default/runner-q6ecjtkg-project-25452826-concurrent-0-u1bwhq7q to be running, status is Pending
	ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
	ContainersNotReady: "containers with unready status: [build helper]"
	ContainersNotReady: "containers with unready status: [build helper]"
WARNING: Failed to pull image with policy "": image pull failed: Back-off pulling image "alpine:invalid"
ERROR: Job failed: prepare environment: waiting for pod running: pulling image "alpine:invalid": image pull failed: Back-off pulling image "alpine:invalid". Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information

Failure to retrieve the events log in GitLab Runner Logs

Error retrieving events list: events is forbidden: User "system:serviceaccount:default:gitlab-runner" cannot list resource "events" in API group "" in the namespace "default"  job=7301034076 project=25452826 runner=xxx

The warning events are not retrieved and it doesn't impact the outcome of the job.

What are the relevant issue numbers?

close #37834 (closed)

Edited by Romuald Atchadé

Merge request reports