Support activeDeadlineSeconds for pods spawned by kubernetes executor
Description
Problem / Use Case
In a multi-tenant deployment of Kubernetes, some form of resource quota enforcement must be implemented to prevent service abuse. OpenShift implements multiple quota scopes utilizing a Pod resource's activeDeadlineSeconds
property. Having separate quotas (Terminating
for ephemeral pods and NotTerminating
for longer-lived pods) helps to improve utilization density and reduce over-provisioning of compute resources.
The kubernetes
executor in GitLab Runner allows GitLab CI jobs to run inside Kubernetes pods ephemerally. However, these pods are not recognized as short-lived by Kubernetes/OpenShift and thus do not benefit from all of the platforms' built-in features. For end-users reaching the limits of their resource quotas, this results in failed GitLab CI jobs .
Goals
- Support OpenShift Quota Scopes (
Terminating
,NotTerminating
, etc) - Leverage Kubernetes built-in pod lifecycle management
Benefits
- Users of the Kubernetes executor of GitLab Runner in a multi-tenant OpenShift environment will have less failed jobs from exceeding quota
- Reduced job runtime as a result of faster pod scheduling - potentially (OpenShift)
- Reduced resource contention with long-lived pods (OpenShift)
- Simplified and more robust pod lifecycle management (Kubernetes)
Proposal
Extend the kubernetes
executor of GitLab Runner to add support for activeDeadlineSeconds
.
-
the value for
activeDeadlineSeconds
will be set based on thetimeout
setting of the job, project, and runner. -
the feature will be put behind a feature flag (disabled by default)
-
By leveraging an existing Gitlab CI setting, no additional configuration is needed by the end-user.
-
On the backend, all associated pods that are spawned by the
kubernetes
executor haveactiveDeadlineSeconds
set, allowing Kubernetes to automatically manage their lifecycle. Additionally, on multi-tenant OpenShift deployments with enforced quotas, the presence ofactiveDeadlineSeconds
setting results in pods having theTerminating
Quota Scope (instead ofNotTerminating
).
Links to related issues and merge requests / references
- #3259
- https://docs.openshift.com/container-platform/4.11/applications/quotas/quotas-setting-per-project.html#quotas-scopes_quotas-setting-per-project
- https://docs.openshift.com/container-platform/4.11/nodes/jobs/nodes-nodes-jobs.html#jobs-set-max_nodes-nodes-jobs
- https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup
- https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#lifecycle
- https://docs.gitlab.com/ee/ci/yaml/#timeout
- https://docs.gitlab.com/ee/ci/pipelines/settings.html#set-a-limit-for-how-long-jobs-can-run
- https://docs.gitlab.com/ee/ci/runners/configure_runners.html#set-maximum-job-timeout-for-a-runner
~"type::feature".