restore_cache killed leaves partial cache behind
Summary
When the restore_cache helper task is interrupted due to out-of-memory kill, the cache directory is left behind leaving a corrupt cache.
Steps to reproduce
.gitlab-ci.yml
variables:
PIP_CACHE_DIR: $CI_PROJECT_DIR/.cache/pip
# Artificially limit helper memory to induce an out-of-memory kill.
KUBERNETES_HELPER_MEMORY_REQUEST: 64Mi
KUBERNETES_HELPER_MEMORY_LIMIT: 64Mi
cache:
key: $CI_JOB_SLUG
paths:
- $CI_PROJECT_DIR/.cache/pip
build:
script:
- pip install -r requirements.txt
Actual behavior
Cache restore is attempted, and killed due to out-of-memory:
Checking cache for v1-poetry-1-protected...
Downloading cache from https://gitlab-runner-cache-vt-sidvps-prod-ue2.s3.dualstack.us-east-2.amazonaws.com/project/15341/v1-poetry-1-protected
/scripts-15341-48313476/restore_cache: line 227: 157 Killed '/usr/bin/gitlab-runner-helper' cache-extractor --file ../../../../cache/vt/si-devops/vehicle-manifest-builder/v1-poetry-1-protected/cache.zip --timeout 10 --url '[redacted]'
Failed to extract cache
Executing "step_script" stage of the job script
And then during a python poetry install that consumes the cache:
Installing the current project: vehicle-manifest-builder (0.7.2)
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/cleo/application.py", line 327, in run
File "/usr/local/lib/python3.10/site-packages/poetry/console/application.py", line 190, in _run
File "/usr/local/lib/python3.10/site-packages/cleo/application.py", line 431, in _run
File "/usr/local/lib/python3.10/site-packages/cleo/application.py", line 473, in _run_command
File "/usr/local/lib/python3.10/site-packages/cleo/application.py", line 457, in _run_command
File "/usr/local/lib/python3.10/site-packages/cleo/commands/base_command.py", line 117, in run
File "/usr/local/lib/python3.10/site-packages/cleo/commands/command.py", line 61, in execute
File "/usr/local/lib/python3.10/site-packages/poetry/console/commands/install.py", line 179, in handle
File "/usr/local/lib/python3.10/site-packages/poetry/masonry/builders/editable.py", line 47, in __init__
File "/usr/local/lib/python3.10/site-packages/poetry/core/masonry/builders/builder.py", line 42, in __init__
ModuleNotFoundError: No module named 'poetry.core.masonry.metadata'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/poetry", line 8, in <module>
File "/usr/local/lib/python3.10/site-packages/poetry/console/application.py", line 411, in main
File "/usr/local/lib/python3.10/site-packages/cleo/application.py", line 338, in run
File "/usr/local/lib/python3.10/site-packages/poetry/console/application.py", line 180, in render_error
File "/usr/local/lib/python3.10/site-packages/poetry/console/application.py", line 396, in _get_solution_provider_repository
ModuleNotFoundError: No module named 'crashtest'
Expected behavior
If restore_cache is killed, then the cache directory should be removed as if no cache existed.
Relevant logs and/or screenshots
N/A
Environment description
- Self-managed GitLab
- gitlab-runner v16.5
- Kubernetes executor
Possible fixes
Edited by Aaron Borden