Skip to content

Excessive memory consumption of restore_cache helper script.

Summary

The restore_cache helper script is Killed because it consumes too much memory. We have had to up some projects' helper_memory_request and helper_memory_limit to 8 GiB to make them restore the build cache without corruption. Others were still failing even with 8 GiB and we decided to recommend to disable the build cache to those maintainers because we don't want to start nodes with that much memory.

Steps to reproduce

  1. Build any project that ends up generating a large build cache (>> 1GiB, a Gradle, Scala, or Java project comes to mind).
  2. Retry the build to have the restore_cache script restore the cache from S3.
  3. Watch the script get Killed. 3.1. Watch the build fail because the cache on disk is corrupted.

What is the current bug behavior?

For example, Gradle will error out with a zip END header not found error message due to a corrupted jar file.

What is the expected correct behavior?

  1. The restore_cache script ought not to consume so much memory that it gets killed by the OS.
  2. If the restore_cache script is killed by OS, the job ought to fail as it leads to corrupted files.

Relevant logs and/or screenshots

/scripts-31249723-3162629342/restore_cache: line 167:   117 Killed                  '/usr/bin/gitlab-runner-helper' "cache-extractor" "--file" "---REDACTED---/cache.zip" "--timeout" "10" "--url" "https://---REDACTED---"

Output of checks

Possible fixes

¯\_(ツ)_/¯