Shell executor doesn't clean up build directories
Release notes
Previously, the output of builds from your CI/CD pipeline runs could leave artifacts in the build directory. This will reduce the available free space on the disk. With the new, as of 14.3, FF_ENABLE_JOB_CLEANUP
feature flag enabled, (off by default), a step will now run at the end of the build to remove files in the build directory.
Problem
I have noticed that our Windows builders have started to fill their disks after only a day or two of building. Almost all of the space used is in \GitLabRunner\builds
. It appears that the shell
executor (which as far as I can tell is the only supported executor on Windows until #2609 (closed) or !706 (closed) is resolved) is not cleaning up build directories after they have finished.
The behavior is the same for the shell executor on all platforms.
This problem is caused by builds where IsSharedEnv()
is true, as well as builds run by e.g. the Docker executor when the /builds
directory is host-mounted persistently.
Proposal
-
Introduce a new
FF_ENABLE_JOB_CLEANUP
feature flag (disabled by default); -
Consolidate a new
cleanup
job stage with the existingcleanup_file_variables
job stage. It is important for this phase to happen at the end of the job, when artifacts/cache have been uploaded, so that we can clean those up without interfering with the logic. -
Reuse the cleanup that is done at job startup to apply it on the project director at the end of the job (subject to the
FF_ENABLE_JOB_CLEANUP
FF), while ensuring that no breaking behavior is introduced. The cleanup should be dependent onGIT_STRATEGY
and should take sub-modules into consideration:GIT_STRATEGY value Cleanup action fetch
git clean ${GIT_CLEAN_FLAGS} && git reset --hard
, also for sub-modules depending onGIT_SUBMODULE_STRATEGY
clone
rm -rf ${CI_PROJECT_DIR}/
none
do nothing In short, the behavior depending on the FF would be the following:
- FF disabled: Clean up on start.
- FF enabled: Clean up on start and finish.
To evaluate the implementation, consider the following scenario of a shell executor running inside an alpine:latest
Docker container:
.gitlab-ci.yml
job:
script:
- truncate -s 2G artifact.bin
- df -k -h ${CI_BUILDS_DIR}
artifacts:
paths:
- artifact.bin
config.toml
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "shell runner"
url = "https://gitlab.com/"
executor = "shell"
executor = "bash"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
Running this job multiple times should not result in increasing values reported in the Used
column of the df
tool output. The same should be true with concurrent
> 1 once we've saturated the number of allowed concurrent builds (meaning that a directory has been created for each of the possible concurrent jobs under $CI_BUILDS_DIR/$CI_RUNNER_SHORT_TOKEN
).