Provision pod at resource quota with GitLab Runner Kubernetes executor

Description

The kubernetes executor spins up ephemeral runner pods in a kubernetes namespace. Namespaces may set quotas on resource limits. For example, if a namespace has a resourcequota that sets cpu.limits in the namespace to 100, and you have 20 runner pods spun up that each set a limit of 5 CPU. When the kubernetes executor goes to spin up a new runner to handle a CI job, it will fail because it exceeds that namespace's quota. See for ex:

ERROR: Job failed (system failure): prepare environment: setting up build pod: pods "runner-sqplrhbg-project-xxxxxxx-concurrent-3nmmww" is forbidden: exceeded quota: cpu-mem-quota, requested: limits.cpu=5, used: limits.cpu=716500m, limited: limits.cpu=720. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information

Proposal

Instead of failing to spin up on the first attempt, what if there were an incremental backoff mechanism for attempting to reschedule a job pod at a later time. The number of backoff attempts could be configured to, say, try 3 times or continue rescheduling until the namespace quotas allow it.

Potential issue I see with this approach is if the namespace resource quota stays completely saturated indefinitely, a large queue of jobs may be produced in the executor's backlog. In my opinion, this may be a better experience in some situations than just failing the job though.

Provision pod at resource quota with GitLab Runner Kubernetes executor

Description

Proposal

Links to related issues and merge requests / references