Support activeDeadlineSeconds for pods spawned by kubernetes executor

Description

Problem / Use Case

In a multi-tenant deployment of Kubernetes, some form of resource quota enforcement must be implemented to prevent service abuse. OpenShift implements multiple quota scopes utilizing a Pod resource's activeDeadlineSeconds property. Having separate quotas (Terminating for ephemeral pods and NotTerminating for longer-lived pods) helps to improve utilization density and reduce over-provisioning of compute resources.

The kubernetes executor in GitLab Runner allows GitLab CI jobs to run inside Kubernetes pods ephemerally. However, these pods are not recognized as short-lived by Kubernetes/OpenShift and thus do not benefit from all of the platforms' built-in features. For end-users reaching the limits of their resource quotas, this results in failed GitLab CI jobs .

Goals

  • Support OpenShift Quota Scopes (Terminating, NotTerminating, etc)
  • Leverage Kubernetes built-in pod lifecycle management

Benefits

  • Users of the Kubernetes executor of GitLab Runner in a multi-tenant OpenShift environment will have less failed jobs from exceeding quota
  • Reduced job runtime as a result of faster pod scheduling - potentially (OpenShift)
  • Reduced resource contention with long-lived pods (OpenShift)
  • Simplified and more robust pod lifecycle management (Kubernetes)

Proposal

Extend the kubernetes executor of GitLab Runner to add support for activeDeadlineSeconds.

  • the value for activeDeadlineSeconds will be set based on the timeout setting of the job, project, and runner.

  • the feature will be put behind a feature flag (disabled by default)

  • By leveraging an existing Gitlab CI setting, no additional configuration is needed by the end-user.

  • On the backend, all associated pods that are spawned by the kubernetes executor have activeDeadlineSeconds set, allowing Kubernetes to automatically manage their lifecycle. Additionally, on multi-tenant OpenShift deployments with enforced quotas, the presence of activeDeadlineSeconds setting results in pods having the Terminating Quota Scope (instead of NotTerminating).

Links to related issues and merge requests / references

~"type::feature".
Edited by Darren Eastman