Switch deletion propagation to background for Pod's dependents (!4339) · Merge requests · GitLab.org / gitlab-runner

Romuald Atchadé requested to merge k8s-owner-reference-management into main Sep 07, 2023

What does this MR do?

In order to reduce the resources leftover when the job Pod is deleted, the OwnerReference was implemented for the executorkubernetes (see !2983 (merged)). Generally speaking, when the job finishes, the job pod is successfully deleted.

However in the use case described in the issue #29291 (closed), having the Foreground policy for the deletion propagation prevents the deletion of the job Pod which then gets stuck in Terminating state.

There was three options IMO to handle it:

Add a configuration to disable altogether the OwnerReference setting
Switch from foreground to background as the problem doesn't occur with this policy
Manually delete all the resources as it was done prior to the MR !2983 (merged)

I went for the 2nd option. With the background policy, according to k8s doc

In background cascading deletion, the Kubernetes API server deletes the owner object immediately and the controller cleans up the dependent objects in the background. By default, Kubernetes uses background cascading deletion unless you manually use foreground deletion or choose to orphan the dependent objects.

The k8s leftover are still deleted after the owner is removed.

Why was this MR needed?

Prevent the runner Pod to get stuck in Terminating state

What's the best way to test this MR?

Install the latest version of the GitLab Runner Helm Chart. This was tested with cluster on GKE. Use the following image for the test: registry.gitlab.com/gitlab-org/gitlab-runner:alpine3.17-k8s-owner-reference-management

value.yaml

image:
  registry: registry.gitlab.com
  image: gitlab-org/gitlab-runner
  tag: alpine3.17-k8s-owner-reference-management
useTini: false
imagePullPolicy: IfNotPresent
replicas: 1
gitlabUrl: https://gitlab.com/
runnerToken: "__REDACTED__"
unregisterRunners: true
terminationGracePeriodSeconds: 0
concurrent: 1
checkInterval: 1
logLevel: "debug"
sessionServer:
  enabled: false
  annotations: {}
rbac:
  create: true
  rules:
    - apiGroups: [""]
      resources: ["events", "pods", "pods/attach", "secrets", "services",  "serviceAccounts"]
      verbs: ["get", "list", "watch", "create", "patch", "update", "delete"]
    - apiGroups: [""]
      resources: ["pods/exec"]
      verbs: ["create", "patch", "delete"]
  clusterWideAccess: false
  podSecurityPolicy:
    enabled: false
    resourceNames:
    - gitlab-runner
metrics:
  enabled: true
  portName: metrics
  port: 9252
  serviceMonitor:
    enabled: false
service:
  enabled: false
  type: ClusterIP
runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        image = "alpine"
        memory_request = "10Mi"
  builds: {}
  services: {}
  helpers: {}
securityContext:
  allowPrivilegeEscalation: true
  readOnlyRootFilesystem: false
  runAsNonRoot: true
podSecurityContext:
  runAsUser: 100
  fsGroup: 65533
resources:
  requests:
    memory: 10Mi
    cpu: 100m
affinity: {}
nodeSelector: {}
tolerations: []
hostAliases: []
podAnnotations: {}
podLabels: {}
hpa: {}
secrets: []
configMaps: {}
volumeMounts: []
volumes: []

Deploy the digester webhook to your cluster
Run any pipeline using the newly installed runner
The job Pod should be gone once the job succeeds/fails

What are the relevant issue numbers?

Fixes #29291 (closed)

Edited Sep 08, 2023 by Romuald Atchadé

Switch deletion propagation to background for Pod's dependents

What does this MR do?

Why was this MR needed?

What's the best way to test this MR?

What are the relevant issue numbers?

Merge request reports