Helper container getting OOM killed even though sufficient resources available
Summary
Our helper containers are getting killed with 137 error (OOM Killing) quite frequently even though sufficient resources are available on the node
Steps to reproduce
Our jobs pulls cache stored in minIO and artefacts from previous jobs. Jobs are failing randomly while trying to perform these actions even though sufficient resources are available on the node.
.gitlab-ci.yml
DetektAnalysis:
stage: code_quality_checks
cache:
paths:
- .gradle/wrappper
- .gradle/caches
policy: pull
script:
- export IS_ANALYZE_PHASE="true"
- mv gradle/ci-configs/mid-config-gradle.properties gradle.properties
- ruby scripts/ci-scripts/DetektAnalysis.rb
artifacts:
expire_in: 1 week
paths:
- pages
- detekt_report
- detekt-combined-results.xml
- changed_modules.csv
dependencies:
- BuildIntegrationFeature
- FetchMergeRequestChangedFiles
only:
refs:
- merge_requests
variables:
- $CI_COMMIT_REF_NAME =~ /^(feature|fix|hotfix|cherry-pick|spike|sync|task)\/(?:.+)/
tags:
- consumer-app-build
- mid-config
interruptible: true
Actual behavior
The runner fails citing command terminated with exit code 137
Expected behavior
The job should pass as sufficient memory is available
Relevant logs and/or screenshots
Failing job log 1 (artefact download failure)
Running with gitlab-runner 13.3.1 (738bbe5a)
on mid-config-runner-gitlab-runner-bfdf477f-69h8q
Preparing the "kubernetes" executor
Using Kubernetes namespace: default
Using Kubernetes executor with image asia.gcr.io/systems-0001/android-ci-sdk ...
Preparing environment
Waiting for pod default/runner-ucezt9ga-project-2339-concurrent-3bgkh9 to be running, status is Pending
Waiting for pod default/runner-ucezt9ga-project-2339-concurrent-3bgkh9 to be running, status is Pending
Waiting for pod default/runner-ucezt9ga-project-2339-concurrent-3bgkh9 to be running, status is Pending
Running on runner-ucezt9ga-project-2339-concurrent-3bgkh9 via mid-config-runner-gitlab-runner-bfdf477f-69h8q...
Getting source from Git repository
Fetching changes with git depth set to 100...
Initialized empty Git repository in /builds/mobile/GoHost/.git/
Created fresh repository.
Checking out 6e09b594 as refs/merge-requests/14956/head...
Skipping Git submodules setup
Downloading artifacts
Downloading artifacts for BuildIntegrationFeature (10498748)...
ERROR: Job failed: command terminated with exit code 137
Failing job log 2 (minio cache pull failure)
Running with gitlab-runner 13.3.1 (738bbe5a)
on commit-high-config-runner-gitlab-runner-64bd4cc4bb-6qss8
Preparing the "kubernetes" executor
Using Kubernetes namespace: default
Using Kubernetes executor with image asia.gcr.io/systems-0001/ubuntu-android-ci-sdk ...
Preparing environment
Waiting for pod default/runner-sc8s9r8w-project-2339-concurrent-62kl4r to be running, status is Pending
Waiting for pod default/runner-sc8s9r8w-project-2339-concurrent-62kl4r to be running, status is Pending
Waiting for pod default/runner-sc8s9r8w-project-2339-concurrent-62kl4r to be running, status is Pending
Waiting for pod default/runner-sc8s9r8w-project-2339-concurrent-62kl4r to be running, status is Pending
Waiting for pod default/runner-sc8s9r8w-project-2339-concurrent-62kl4r to be running, status is Pending
Waiting for pod default/runner-sc8s9r8w-project-2339-concurrent-62kl4r to be running, status is Pending
Waiting for pod default/runner-sc8s9r8w-project-2339-concurrent-62kl4r to be running, status is Pending
Running on runner-sc8s9r8w-project-2339-concurrent-62kl4r via commit-high-config-runner-gitlab-runner-64bd4cc4bb-6qss8...
Getting source from Git repository
Fetching changes with git depth set to 100...
Initialized empty Git repository in /builds/mobile/GoHost/.git/
Created fresh repository.
Checking out 80374c25 as task/delete_customoji...
Skipping Git submodules setup
Restoring cache
Checking cache for default-17...
Downloading cache.zip from http://consumer-app-minio.default.svc.cluster.local:9000/consumer-app-minio-cache/gitlab_runner/project/2339/default-17
Uploading artifacts for failed job
ERROR: Job failed: command terminated with exit code 137
Environment description
This is a custom installation. GKE version - 1.15.20 Node OS - COS Gitlab runner version - 13.3.1 Gitlab runner chart - 0.20 Size of the artifact trying to download - 80MB MinIO cache size - 650 MB
config.yml contents
image: gitlab/gitlab-runner:alpine-v13.3.1
imagePullPolicy: "IfNotPresent"
gitlabUrl: "url here"
runnerRegistrationToken: "token"
unregisterRunners: true
terminationGracePeriodSeconds: 3600
concurrent: 8
checkInterval: 3
logLevel: debug
rbac:
create: true
clusterWideAccess: false
metrics:
enabled: true
nodeSelector:
node-type: master-node
runners:
image: asia.gcr.io/systems-0001/android-ci-sdk
imagePullPolicy: "always"
locked: true
tags: "consumer-app-build,mid-config"
privileged: false
pollTimeout: 600
outputLimit: 10240
cache:
cacheType: s3
cacheShared: true
s3ServerAddress: minio-path.default.svc.cluster.local:9000
s3CacheInsecure: true
s3BucketName: consumer-app-minio-cache
s3CachePath: "gitlab_runner"
secretName: secret
builds:
cpuLimit: 5000m
memoryLimit: 7Gi
cpuRequests: 5000m
memoryRequests: 7Gi
services: {}
helpers:
cpuLimit: 500m
memoryLimit: 1000Mi
cpuRequests: 500m
memoryRequests: 1000Mi
podLabels:
podName: mid-config
jobType: ${CI_JOB_NAME}
nodeSelector:
node-type: mid-config
envVars:
- name: KUBERNETES_POLL_INTERVAL
value: 10
Used GitLab Runner version
Running with gitlab-runner 13.3.1 (738bbe5a)
on mid-config-runner-gitlab-runner-bfdf477f-69h8q ucezt9ga
Preparing the "kubernetes" executor
Possible fixes
Not sure but some help would be great to know what should be the ideal helper configuration. This used to work absolutely fine previously. It has only started failing after we jumped to runner v13.3 from v12.9.