Kubernetes Executor OOMKilled Due to Rapid Memory Allocation (False Positive OOM Kill)

Summary

We are experiencing frequent OOMKilled failures in our GitLab Runner Kubernetes executor when running Java-based build jobs for one of our customers. The affected jobs allocate memory quickly but remain within the configured memory limits. Kubernetes appears to terminate the build container prematurely due to the rate of memory allocation rather than actual memory pressure.

This leads to multiple failed pipeline runs per day and requires manual restarts or adjustments.

Runner / Environment Details

GitLab Runner version: 18.5.0 (bda84871)
Executor: Kubernetes
Namespace: gitlab-runner
Pod strategy: attach
Image: customer-provided build container
Pod termination cause:

Error in container build: exit code: 137, reason: 'OOMKilled'

Observed Behavior

The build pod stays in Pending for an extended period due to:

ContainersNotReady: "containers with unready status: [build helper]"

Once the job starts, the Java process fails:

ERROR: Job failed (system failure): Error in container build: exit code: 137, reason: 'OOMKilled'

This occurs multiple times per day for our customer.

Reproduction / Minimal Example

A minimal example demonstrating the issue:

java -XX:+UseContainerSupport -XX:+AlwaysPreTouch -XX:InitialRAMPercentage=30 -XX:MaxRAMPercentage=30 HelloWorld.java

Removing -XX:+AlwaysPreTouch prevents the OOMKill.

Important Observation

The issue is not caused by actual memory exhaustion.

Increasing runner sizes delays the problem but does not eliminate it.

Memory allocation tests:

Allocating ~15% of total memory at once → works
Allocating ~20% at once → triggers immediate OOMKill, even though ~80% memory is still available

This indicates that Kubernetes or the container runtime reacts to allocation speed, not total usage. It seems kubelet predicts future memory consumption and kills the pod pre-emptively, although in our workload the allocation stops shortly afterwards.

Impact

Multiple job failures per day
Manual reruns required
Delays for the customer
Larger runner sizing only partially mitigates the issue and increases cost

Expected Behavior

A pod should not be OOMKilled as long as it stays under the configured memory limit.

The Java process allocates memory quickly but stays well below the limit overall.

What We Suspect

Potential contributing factors include:

Kubernetes memory throttling or prediction mechanisms
Container runtime conservative memory heuristics
Java AlwaysPreTouch rapidly touching memory pages
Runner helper container or QoS class affecting memory behavior

This likely leads to false positive OOMKills.

Request for GitLab Team

We kindly request support from the GitLab team regarding:

Investigation into whether the GitLab Runner or its Kubernetes executor configuration triggers overly aggressive memory-related pod termination.
Recommended configuration to prevent OOMKills caused by rapid-but-controlled memory allocation patterns:
- cgroup memory tuning
- helper container memory settings
- QoS / pod spec configuration
- known issues or workarounds with Java + Kubernetes executors
Information on whether this behavior is known or documented for GitLab Runner.

Logs (Excerpt)

Running with gitlab-runner 18.5.0 (bda84871) Using Kubernetes executor … ContainersNotReady: "containers with unready status: [build helper]" … ERROR: Job failed (system failure): exit code: 137, reason: 'OOMKilled'

Conclusion

This appears to be an unintended interaction between GitLab Runner’s Kubernetes executor and Java’s rapid memory