Kubernetes executor helper container memory usage
Summary
Helper container is being killed when helper memory requests and limits are set to 256Mi and the "error: --shallow-file died of signal 9" message is displayed in the ci job output. This works when the limit is increased to 512Mi, but this seems to be an excessive resource request for a helper container.
The repository in question is only 61.6 MB in size.
Steps to reproduce
Deploy the gitlab runner using the helm chart:
resource "helm_release" "runners" {
name = "gitlab-runners"
namespace = kubernetes_namespace.gitlab.metadata[0].name
repository = "https://charts.gitlab.io/"
chart = "gitlab-runner"
version = "0.40.0"
skip_crds = true
timeout = 1800
values = [
yamlencode({
gitlabUrl = "https://gitlab.url/"
unregisterRunners = true
checkInterval = 1
concurrent = 20
logLevel = "info"
logFormat = "json"
metrics = {
enabled = true
serviceMonitor = {
enabled = true
}
}
rbac = {
clusterWideAccess = false
create = true
serviceAccountAnnotations = {
"eks.amazonaws.com/role-arn" = module.eks_iam_role_gitlab_runner.arn
}
}
resources = {
limits = {
cpu = "500m"
memory = "512Mi"
}
requests = {
cpu = "200m"
memory = "256Mi"
}
}
runners = {
executor = "kubernetes"
imagePullPolicy = "always"
locked = false
outputLimit = 10240
protected = false
runUntagged = true
secret = kubernetes_secret.registration_token.metadata[0].name
namespace = kubernetes_namespace.gitlab_ci.metadata[0].name
tags = "k8s"
config = <<-EOT
[[runners]]
[runners.kubernetes]
privileged = false
image = "alpine:latest"
pull_policy = "always"
cpu_request = "500m"
cpu_request_overwrite_max_allowed = "2000m"
memory_request = "1024Mi"
memory_request_overwrite_max_allowed = "4096Mi"
cpu_limit = "1000m"
cpu_limit_overwrite_max_allowed = "4000m"
memory_limit = "1024Mi"
memory_limit_overwrite_max_allowed = "4096Mi"
helper_cpu_request = "500m"
helper_cpu_request_overwrite_max_allowed = "1000m"
helper_memory_request = "256Mi"
helper_memory_request_overwrite_max_allowed = "2048Mi"
helper_cpu_limit = "1000m"
helper_cpu_limit_overwrite_max_allowed = "2000m"
helper_memory_limit = "256Mi"
helper_memory_limit_overwrite_max_allowed = "2048Mi"
service_cpu_request = "100m"
service_cpu_request_overwrite_max_allowed = "2000m"
service_memory_request = "128Mi"
service_memory_request_overwrite_max_allowed = "2048Mi"
service_cpu_limit = "1000m"
service_cpu_limit_overwrite_max_allowed = "4000m"
service_memory_limit = "128Mi"
service_memory_limit_overwrite_max_allowed = "2048Mi"
[runners.cache]
Type = "s3"
Shared = true
Path = "my/bucket/path"
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
BucketName = "${aws_s3_bucket.gitlab_ci_cache.bucket}"
BucketLocation = "${aws_s3_bucket.gitlab_ci_cache.region}"
Insecure = false
EOT
}
})]
}
Run a ci job through the runner and observe the output:
Created fresh repository.
error: --shallow-file died of signal 9
fatal: unpack-objects failed
This only happens on some repositories, but when it happens once it happens continuously. Until now, our workaround has been to either override the memory limit in the project using "KUBERNETES_HELPER_MEMORY_LIMIT", or delete the repository and create a new one (which somehow fixes the issue).
Setting GIT_STRATEGY to "fetch" or "clone" has no impact.
Actual behaviour
The helper pod is killed due to memory constraints and the following message is displayed in the ci job logs:
Created fresh repository.
error: --shallow-file died of signal 9
fatal: unpack-objects failed
Expected behaviour
The job should run successfully.
Used GitLab Runner version
Using the kubernetes executor on.
Version: 14.10.0
Git revision: c6bb62f6
Git branch: 14-10-stable
GO version: go1.17.7
Built: 2022-04-19T18:17:10+0000
OS/Arch: linux/amd64
Possible fixes
Possibly a bug in the git clone code in the helper pod. Possibly an issue with garbage collection and memory utilisation.