kubernetes: add PodDisruptionBudget support for job pods
What does this MR do and why?
Adds optional PodDisruptionBudget (PDB) support for Kubernetes executor job pods to prevent voluntary evictions during node drains and cluster upgrades.
Problem
When running CI jobs on Kubernetes, node drains (during upgrades, autoscaling, maintenance) can evict job pods, causing job failures. Currently there's no protection against this.
Solution
When enabled, the executor creates a PodDisruptionBudget with minAvailable: 1 for each job pod. This prevents the Kubernetes eviction API from evicting the pod during voluntary disruptions while still allowing:
- Pod termination when the job completes
- Involuntary disruptions (node failures, OOM kills)
Configuration
[runners.kubernetes]
pod_disruption_budget = true # disabled by default
Or via environment variable:
KUBERNETES_POD_DISRUPTION_BUDGET=true
How it works
-
After creating the job pod, if
pod_disruption_budgetis enabled, a PDB is created with:-
minAvailable: 1- prevents eviction of the single job pod - Label selector matching the job pod's unique label (
job.runner.gitlab.com/pod) - OwnerReference pointing to the pod (automatic garbage collection)
-
-
The PDB is automatically deleted when the pod is deleted (via ownerReference)
-
Fallback cleanup in
cleanupResources()if ownerReference wasn't set
Example PDB created
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: runner-abc123-pdb
namespace: gitlab-runner
ownerReferences:
- apiVersion: v1
kind: Pod
name: runner-abc123
uid: <pod-uid>
spec:
minAvailable: 1
selector:
matchLabels:
job.runner.gitlab.com/pod: <job-unique-name>
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
- I have evaluated the MR acceptance checklist for this MR.
- Tests added for new functionality
- Documentation updated