Kubernetes executor helper container memory usage

Summary

Helper container is being killed when helper memory requests and limits are set to 256Mi and the "error: --shallow-file died of signal 9" message is displayed in the ci job output. This works when the limit is increased to 512Mi, but this seems to be an excessive resource request for a helper container.

The repository in question is only 61.6 MB in size.

Steps to reproduce

Deploy the gitlab runner using the helm chart:

resource "helm_release" "runners" {
  name      = "gitlab-runners"
  namespace = kubernetes_namespace.gitlab.metadata[0].name

  repository = "https://charts.gitlab.io/"
  chart      = "gitlab-runner"
  version    = "0.40.0"
  skip_crds  = true

  timeout = 1800

  values = [
    yamlencode({
      gitlabUrl         = "https://gitlab.url/"
      unregisterRunners = true
      checkInterval     = 1
      concurrent        = 20
      logLevel          = "info"
      logFormat         = "json"
      metrics = {
        enabled = true
        serviceMonitor = {
          enabled = true
        }
      }
      rbac = {
        clusterWideAccess = false
        create            = true
        serviceAccountAnnotations = {
          "eks.amazonaws.com/role-arn" = module.eks_iam_role_gitlab_runner.arn
        }
      }
      resources = {
        limits = {
          cpu    = "500m"
          memory = "512Mi"
        }
        requests = {
          cpu    = "200m"
          memory = "256Mi"
        }
      }
      runners = {
        executor        = "kubernetes"
        imagePullPolicy = "always"
        locked          = false
        outputLimit     = 10240
        protected       = false
        runUntagged     = true
        secret          = kubernetes_secret.registration_token.metadata[0].name
        namespace       = kubernetes_namespace.gitlab_ci.metadata[0].name
        tags            = "k8s"
        config          = <<-EOT
        [[runners]]
        [runners.kubernetes]
            privileged = false
            image = "alpine:latest"
            pull_policy = "always"
            cpu_request = "500m"
            cpu_request_overwrite_max_allowed = "2000m"
            memory_request = "1024Mi"
            memory_request_overwrite_max_allowed = "4096Mi"
            cpu_limit = "1000m"
            cpu_limit_overwrite_max_allowed = "4000m"
            memory_limit = "1024Mi"
            memory_limit_overwrite_max_allowed = "4096Mi"
            helper_cpu_request = "500m"
            helper_cpu_request_overwrite_max_allowed = "1000m"
            helper_memory_request = "256Mi"
            helper_memory_request_overwrite_max_allowed = "2048Mi"
            helper_cpu_limit = "1000m"
            helper_cpu_limit_overwrite_max_allowed = "2000m"
            helper_memory_limit = "256Mi"
            helper_memory_limit_overwrite_max_allowed = "2048Mi"
            service_cpu_request = "100m"
            service_cpu_request_overwrite_max_allowed = "2000m"
            service_memory_request = "128Mi"
            service_memory_request_overwrite_max_allowed = "2048Mi"
            service_cpu_limit = "1000m"
            service_cpu_limit_overwrite_max_allowed = "4000m"
            service_memory_limit = "128Mi"
            service_memory_limit_overwrite_max_allowed = "2048Mi"
            [runners.cache]
            Type = "s3"
            Shared = true
            Path = "my/bucket/path"
            [runners.cache.s3]
                ServerAddress = "s3.amazonaws.com"
                BucketName = "${aws_s3_bucket.gitlab_ci_cache.bucket}"
                BucketLocation = "${aws_s3_bucket.gitlab_ci_cache.region}"
                Insecure = false
      EOT
      }
  })]
}

Run a ci job through the runner and observe the output:

Created fresh repository.
error: --shallow-file died of signal 9
fatal: unpack-objects failed

This only happens on some repositories, but when it happens once it happens continuously. Until now, our workaround has been to either override the memory limit in the project using "KUBERNETES_HELPER_MEMORY_LIMIT", or delete the repository and create a new one (which somehow fixes the issue).

Setting GIT_STRATEGY to "fetch" or "clone" has no impact.

Actual behaviour

The helper pod is killed due to memory constraints and the following message is displayed in the ci job logs:

Created fresh repository.
error: --shallow-file died of signal 9
fatal: unpack-objects failed

Expected behaviour

The job should run successfully.

Used GitLab Runner version

Using the kubernetes executor on.

Version:      14.10.0
Git revision: c6bb62f6
Git branch:   14-10-stable
GO version:   go1.17.7
Built:        2022-04-19T18:17:10+0000
OS/Arch:      linux/amd64

Possible fixes

Possibly a bug in the git clone code in the helper pod. Possibly an issue with garbage collection and memory utilisation.