Shell executor doesn't clean up build directories
Release notes
Previously, the output of builds from your CI/CD pipeline runs could leave artifacts in the build directory. This will reduce the available free space on the disk. With the new, as of 14.3, FF_ENABLE_JOB_CLEANUP feature flag enabled, (off by default), a step will now run at the end of the build to remove files in the build directory.
Problem
I have noticed that our Windows builders have started to fill their disks after only a day or two of building. Almost all of the space used is in \GitLabRunner\builds. It appears that the shell executor (which as far as I can tell is the only supported executor on Windows until #2609 (closed) or !706 (closed) is resolved) is not cleaning up build directories after they have finished.
The behavior is the same for the shell executor on all platforms.
This problem is caused by builds where IsSharedEnv() is true, as well as builds run by e.g. the Docker executor when the /builds directory is host-mounted persistently.
Proposal
-
Introduce a new
FF_ENABLE_JOB_CLEANUPfeature flag (disabled by default); -
Consolidate a new
cleanupjob stage with the existingcleanup_file_variablesjob stage. It is important for this phase to happen at the end of the job, when artifacts/cache have been uploaded, so that we can clean those up without interfering with the logic. -
Reuse the cleanup that is done at job startup to apply it on the project director at the end of the job (subject to the
FF_ENABLE_JOB_CLEANUPFF), while ensuring that no breaking behavior is introduced. The cleanup should be dependent onGIT_STRATEGYand should take sub-modules into consideration:GIT_STRATEGY value Cleanup action fetchgit clean ${GIT_CLEAN_FLAGS} && git reset --hard, also for sub-modules depending onGIT_SUBMODULE_STRATEGYclonerm -rf ${CI_PROJECT_DIR}/nonedo nothing In short, the behavior depending on the FF would be the following:
- FF disabled: Clean up on start.
- FF enabled: Clean up on start and finish.
To evaluate the implementation, consider the following scenario of a shell executor running inside an alpine:latest Docker container:
.gitlab-ci.yml
job:
script:
- truncate -s 2G artifact.bin
- df -k -h ${CI_BUILDS_DIR}
artifacts:
paths:
- artifact.bin
config.toml
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "shell runner"
url = "https://gitlab.com/"
executor = "shell"
executor = "bash"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
Running this job multiple times should not result in increasing values reported in the Used column of the df tool output. The same should be true with concurrent > 1 once we've saturated the number of allowed concurrent builds (meaning that a directory has been created for each of the possible concurrent jobs under $CI_BUILDS_DIR/$CI_RUNNER_SHORT_TOKEN).