Skip to content

Switch deletion propagation to background for Pod's dependents

Romuald Atchadé requested to merge k8s-owner-reference-management into main

What does this MR do?

In order to reduce the resources leftover when the job Pod is deleted, the OwnerReference was implemented for the executorkubernetes (see !2983 (merged)). Generally speaking, when the job finishes, the job pod is successfully deleted.

However in the use case described in the issue #29291 (closed), having the Foreground policy for the deletion propagation prevents the deletion of the job Pod which then gets stuck in Terminating state.

There was three options IMO to handle it:

  • Add a configuration to disable altogether the OwnerReference setting
  • Switch from foreground to background as the problem doesn't occur with this policy
  • Manually delete all the resources as it was done prior to the MR !2983 (merged)

I went for the 2nd option. With the background policy, according to k8s doc

In background cascading deletion, the Kubernetes API server deletes the owner object immediately and the controller cleans up the dependent objects in the background. By default, Kubernetes uses background cascading deletion unless you manually use foreground deletion or choose to orphan the dependent objects.

The k8s leftover are still deleted after the owner is removed.

Why was this MR needed?

Prevent the runner Pod to get stuck in Terminating state

What's the best way to test this MR?

  • Install the latest version of the GitLab Runner Helm Chart. This was tested with cluster on GKE. Use the following image for the test: registry.gitlab.com/gitlab-org/gitlab-runner:alpine3.17-k8s-owner-reference-management
value.yaml
image:
  registry: registry.gitlab.com
  image: gitlab-org/gitlab-runner
  tag: alpine3.17-k8s-owner-reference-management
useTini: false
imagePullPolicy: IfNotPresent
replicas: 1
gitlabUrl: https://gitlab.com/
runnerToken: "__REDACTED__"
unregisterRunners: true
terminationGracePeriodSeconds: 0
concurrent: 1
checkInterval: 1
logLevel: "debug"
sessionServer:
  enabled: false
  annotations: {}
rbac:
  create: true
  rules:
    - apiGroups: [""]
      resources: ["events", "pods", "pods/attach", "secrets", "services",  "serviceAccounts"]
      verbs: ["get", "list", "watch", "create", "patch", "update", "delete"]
    - apiGroups: [""]
      resources: ["pods/exec"]
      verbs: ["create", "patch", "delete"]
  clusterWideAccess: false
  podSecurityPolicy:
    enabled: false
    resourceNames:
    - gitlab-runner
metrics:
  enabled: true
  portName: metrics
  port: 9252
  serviceMonitor:
    enabled: false
service:
  enabled: false
  type: ClusterIP
runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        image = "alpine"
        memory_request = "10Mi"
  builds: {}
  services: {}
  helpers: {}
securityContext:
  allowPrivilegeEscalation: true
  readOnlyRootFilesystem: false
  runAsNonRoot: true
podSecurityContext:
  runAsUser: 100
  fsGroup: 65533
resources:
  requests:
    memory: 10Mi
    cpu: 100m
affinity: {}
nodeSelector: {}
tolerations: []
hostAliases: []
podAnnotations: {}
podLabels: {}
hpa: {}
secrets: []
configMaps: {}
volumeMounts: []
volumes: []
  • Deploy the digester webhook to your cluster
  • Run any pipeline using the newly installed runner
  • The job Pod should be gone once the job succeeds/fails

What are the relevant issue numbers?

Fixes #29291 (closed)

Edited by Romuald Atchadé

Merge request reports