Kubernetes: autoscaler for idle capacity via pause pods
What does this MR do?
Adds pause pod autoscaling for the Kubernetes executor to pre-warm cluster capacity.
Problem
Jobs running on the Kubernetes executor may experience delays waiting for pods to be scheduled when node capacity is exhausted. The cluster autoscaler needs time to provision new nodes, causing job startup latency.
Solution
Implements pause pod management based on [[runners.kubernetes.autoscaler.policy]] configuration (matching the existing fleeting/taskscaler pattern). Pause pods reserve cluster capacity that can be quickly preempted when real jobs arrive, reducing job startup latency.
Key components:
- Policy & Scheduling: Cron-based policy selection using
gitlab.com/gitlab-org/fleeting/taskscaler/cron(requires gitlab-org/fleeting/taskscaler!71 (merged) to be merged first) - Pause Pod Manager: Manages a Deployment of pause pods, reconciling replica count based on active policy
- Provider Integration: Wraps the Kubernetes executor provider to add
ManagedExecutorProviderlifecycle hooks
Configuration
[runners.kubernetes.autoscaler]
max_pause_pods = 10
pause_pod_image = "registry.k8s.io/pause:3.10" # optional, this is the default
[[runners.kubernetes.autoscaler.policy]]
idle_count = 5
idle_time = "30m"
periods = ["* 8-17 * * mon-fri"]
timezone = "UTC"
[[runners.kubernetes.autoscaler.policy]]
idle_count = 0
periods = ["* * * * *"] # default fallbackHow it works
- The pause pod manager runs a reconciliation loop every 10 seconds
- It evaluates which policy is active based on current time and cron periods
- Calculates desired replica count using
idle_countand optionalscale_factor - Creates/updates a Deployment to maintain the target number of pause pods
- Pause pods use low priority class so they get preempted when real jobs need capacity
- On shutdown, the deployment is cleaned up
Related issues
Relates to gitlab-com/gl-infra/production-engineering#28168
Dependencies
Author's checklist
- Tests added for new functionality
- Documentation added
- RBAC docs generator updated to support
appsAPI group
Edited by Igor