Kubernetes Executor OOMKilled Due to Rapid Memory Allocation (False Positive OOM Kill)
Summary
We are experiencing frequent OOMKilled failures in our GitLab Runner Kubernetes executor when running Java-based build jobs for one of our customers. The affected jobs allocate memory quickly but remain within the configured memory limits. Kubernetes appears to terminate the build container prematurely due to the rate of memory allocation rather than actual memory pressure.
This leads to multiple failed pipeline runs per day and requires manual restarts or adjustments.
Runner / Environment Details
-
GitLab Runner version: 18.5.0 (
bda84871) - Executor: Kubernetes
-
Namespace:
gitlab-runner -
Pod strategy:
attach - Image: customer-provided build container
- Pod termination cause:
Error in container build: exit code: 137, reason: 'OOMKilled'
Observed Behavior
The build pod stays in Pending for an extended period due to:
ContainersNotReady: "containers with unready status: [build helper]"
Once the job starts, the Java process fails:
ERROR: Job failed (system failure): Error in container build: exit code: 137, reason: 'OOMKilled'
This occurs multiple times per day for our customer.
Reproduction / Minimal Example
A minimal example demonstrating the issue:
java -XX:+UseContainerSupport -XX:+AlwaysPreTouch -XX:InitialRAMPercentage=30 -XX:MaxRAMPercentage=30 HelloWorld.java
Removing -XX:+AlwaysPreTouch prevents the OOMKill.
Important Observation
The issue is not caused by actual memory exhaustion.
Increasing runner sizes delays the problem but does not eliminate it.
Memory allocation tests:
- Allocating ~15% of total memory at once → works
- Allocating ~20% at once → triggers immediate OOMKill, even though ~80% memory is still available
This indicates that Kubernetes or the container runtime reacts to allocation speed, not total usage. It seems kubelet predicts future memory consumption and kills the pod pre-emptively, although in our workload the allocation stops shortly afterwards.
Impact
- Multiple job failures per day
- Manual reruns required
- Delays for the customer
- Larger runner sizing only partially mitigates the issue and increases cost
Expected Behavior
A pod should not be OOMKilled as long as it stays under the configured memory limit.
The Java process allocates memory quickly but stays well below the limit overall.
What We Suspect
Potential contributing factors include:
- Kubernetes memory throttling or prediction mechanisms
- Container runtime conservative memory heuristics
- Java
AlwaysPreTouchrapidly touching memory pages - Runner helper container or QoS class affecting memory behavior
This likely leads to false positive OOMKills.
Request for GitLab Team
We kindly request support from the GitLab team regarding:
- Investigation into whether the GitLab Runner or its Kubernetes executor configuration triggers overly aggressive memory-related pod termination.
-
Recommended configuration to prevent OOMKills caused by rapid-but-controlled memory allocation patterns:
- cgroup memory tuning
- helper container memory settings
- QoS / pod spec configuration
- known issues or workarounds with Java + Kubernetes executors
- Information on whether this behavior is known or documented for GitLab Runner.
Logs (Excerpt)
Running with gitlab-runner 18.5.0 (bda84871) Using Kubernetes executor … ContainersNotReady: "containers with unready status: [build helper]" … ERROR: Job failed (system failure): exit code: 137, reason: 'OOMKilled'
Conclusion
This appears to be an unintended interaction between GitLab Runner’s Kubernetes executor and Java’s rapid memory