Add volume_keep option to Docker executor
What does this MR do?
This MR adds a new configuration option VolumeKeep to GitLab Runner's Docker executor that allows users to disable automatic volume removal when containers are cleaned up. When enabled, Docker volumes persist after container removal instead of being automatically deleted.
Why was this MR needed?
A customer has been seeing significant performance issues with Docker volume cleanup in high-concurrency environments:
- Volume cleanup was taking minutes per job or failing with cleanup errors
- Jobs were hanging at completion, with actual execution time not matching reported time (jobs taking 2-3 minutes vs expected ~1:40)
- The Docker daemon was being blocked during volume removal operations, preventing other commands from executing
- With 128 parallel jobs using docker-in-docker with complex layered volumes, cleanup operations were creating a bottleneck
Testing showed that disabling volume removal reduced average job time from 2-3 minutes to 1 minute 40 seconds with minimal variation.
What's the best way to test this MR?
- Set up a high-concurrency pipeline (e.g., 128 parallel jobs) using docker-in-docker
- Configure the runner with
volume_keep = truein the Docker executor configuration - Run the pipeline and measure job completion times
- Verify that volumes persist after job completion using
docker volume ls - Implement an asynchronous cleanup script (example provided in the work item) to remove old volumes via cron without blocking the Docker daemon
What are the relevant issue numbers?
Edited by Ashvin Sharma