Kubernetes executor: resource limits for init container

Status Update (2023-03-22)

You can now apply this configuration using the custom podspec feature. The feature is currently behind a feature flag FF_USE_ADVANCED_POD_SPEC_CONFIGURATION

Config Example

[[runners]]
  environment = ["FF_USE_ADVANCED_POD_SPEC_CONFIGURATION=true"]
  [runners.kubernetes]
    image = "alpine"
    [[runners.kubernetes.pod_spec]]
      name = "tester"
      patch_type = "strategic"
      patch = '''
      initContainers:
        - name: "init-permissions"
          resources:
            limits:
              cpu: "..."
            requests:
              cpu: "..."
      '''

We are also capturing any feedback wrt this feature in the issue Overwrite generated Kubernetes pod specificatio... (#29659 - closed)

Documentation

Release notes

The Kubernetes executor in Gitlab Runner can now be configured to define resource limits for the init container. This being the last place where resource limits weren't configurable, you can now use ResourceQuota objects in the Kubernetes namespace your CI jobs are run in.

Problem to solve

Trying to limit resources for the gitlab-runner namespace with ResourceQuota objects in kubernetes fails with

ERROR: Job failed (system failure): prepare environment: setting up build pod: pods "runner-lhqxxcua-project-3-concurrent-0zjxfl" is forbidden: failed quota: runners: must specify limits.cpu,limits.memory. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information

After some digging around, I found the init container does not have any resource limits set and configuring them is not possible.

Proposal

Add configuration options init_cpu_limit, init_cpu_requests, init_memory_limit and so, mirroring the already existing cpu_limit, helper_cpu_limit, service_cpu_limit and co. Use the values configured there as resource limits/requests for the init container.

Further details

Having resource requests/limits configured for all containers in all pods the runner creates, allows user to restrict resource usage for CI tasks by using the Kubernetes ResourceQuota objects.

Letting the cluster enforce these quotas can allow for a more flexible distribution of CI jobs, for example, I currently limit my one runner to concurrency = 1, because it's maximum resource usage can only be used once in my cluster. There is no way to configure max total resource usage in the runner, and frankly, what if I run two runners for the same kubernetes cluster? The best location to configure these max total resource limits is in the Kubernetes cluster.

What does success look like, and how can we measure that?

This feature is implemented correctly when the gitlab-runner can be configured to successfully create pods in a namespace with ResourceQuota objects.