Excessive memory consumption of restore_cache helper script.
Summary
The restore_cache
helper script is Killed
because it consumes too much memory. We have had to up some projects' helper_memory_request
and helper_memory_limit
to 8 GiB to make them restore the build cache without corruption. Others were still failing even with 8 GiB and we decided to recommend to disable the build cache to those maintainers because we don't want to start nodes with that much memory.
Steps to reproduce
- Build any project that ends up generating a large build cache (>> 1GiB, a Gradle, Scala, or Java project comes to mind).
- Retry the build to have the
restore_cache
script restore the cache from S3. - Watch the script get
Killed
. 3.1. Watch the build fail because the cache on disk is corrupted.
What is the current bug behavior?
For example, Gradle will error out with a zip END header not found
error message due to a corrupted jar file.
What is the expected correct behavior?
- The
restore_cache
script ought not to consume so much memory that it gets killed by the OS. - If the
restore_cache
script is killed by OS, the job ought to fail as it leads to corrupted files.
Relevant logs and/or screenshots
/scripts-31249723-3162629342/restore_cache: line 167: 117 Killed '/usr/bin/gitlab-runner-helper' "cache-extractor" "--file" "---REDACTED---/cache.zip" "--timeout" "10" "--url" "https://---REDACTED---"
Output of checks
Possible fixes
¯\_(ツ)_/¯