restore_cache killed leaves partial cache behind

Summary

When the restore_cache helper task is interrupted due to out-of-memory kill, the cache directory is left behind leaving a corrupt cache.

Steps to reproduce

.gitlab-ci.yml
variables:
  PIP_CACHE_DIR: $CI_PROJECT_DIR/.cache/pip

  # Artificially limit helper memory to induce an out-of-memory kill.
  KUBERNETES_HELPER_MEMORY_REQUEST: 64Mi
  KUBERNETES_HELPER_MEMORY_LIMIT: 64Mi
  

cache:
  key: $CI_JOB_SLUG
  paths:
    - $CI_PROJECT_DIR/.cache/pip

build:
  script:
    - pip install -r requirements.txt

Actual behavior

Cache restore is attempted, and killed due to out-of-memory:

Checking cache for v1-poetry-1-protected...
Downloading cache from https://gitlab-runner-cache-vt-sidvps-prod-ue2.s3.dualstack.us-east-2.amazonaws.com/project/15341/v1-poetry-1-protected 
/scripts-15341-48313476/restore_cache: line 227:   157 Killed                  '/usr/bin/gitlab-runner-helper' cache-extractor --file ../../../../cache/vt/si-devops/vehicle-manifest-builder/v1-poetry-1-protected/cache.zip --timeout 10 --url '[redacted]'
Failed to extract cache
Executing "step_script" stage of the job script

And then during a python poetry install that consumes the cache:

Installing the current project: vehicle-manifest-builder (0.7.2)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/cleo/application.py", line 327, in run
  File "/usr/local/lib/python3.10/site-packages/poetry/console/application.py", line 190, in _run
  File "/usr/local/lib/python3.10/site-packages/cleo/application.py", line 431, in _run
  File "/usr/local/lib/python3.10/site-packages/cleo/application.py", line 473, in _run_command
  File "/usr/local/lib/python3.10/site-packages/cleo/application.py", line 457, in _run_command
  File "/usr/local/lib/python3.10/site-packages/cleo/commands/base_command.py", line 117, in run
  File "/usr/local/lib/python3.10/site-packages/cleo/commands/command.py", line 61, in execute
  File "/usr/local/lib/python3.10/site-packages/poetry/console/commands/install.py", line 179, in handle
  File "/usr/local/lib/python3.10/site-packages/poetry/masonry/builders/editable.py", line 47, in __init__
  File "/usr/local/lib/python3.10/site-packages/poetry/core/masonry/builders/builder.py", line 42, in __init__
ModuleNotFoundError: No module named 'poetry.core.masonry.metadata'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/bin/poetry", line 8, in <module>
  File "/usr/local/lib/python3.10/site-packages/poetry/console/application.py", line 411, in main
  File "/usr/local/lib/python3.10/site-packages/cleo/application.py", line 338, in run
  File "/usr/local/lib/python3.10/site-packages/poetry/console/application.py", line 180, in render_error
  File "/usr/local/lib/python3.10/site-packages/poetry/console/application.py", line 396, in _get_solution_provider_repository
ModuleNotFoundError: No module named 'crashtest'

Expected behavior

If restore_cache is killed, then the cache directory should be removed as if no cache existed.

Relevant logs and/or screenshots

N/A

Environment description

  • Self-managed GitLab
  • gitlab-runner v16.5
  • Kubernetes executor

Possible fixes

Edited by Aaron Borden