Add volume_keep option to Docker executor

What does this MR do?

This MR adds a new configuration option VolumeKeep to GitLab Runner's Docker executor that allows users to disable automatic volume removal when containers are cleaned up. When enabled, Docker volumes persist after container removal instead of being automatically deleted.

Why was this MR needed?

A customer has been seeing significant performance issues with Docker volume cleanup in high-concurrency environments:

  • Volume cleanup was taking minutes per job or failing with cleanup errors
  • Jobs were hanging at completion, with actual execution time not matching reported time (jobs taking 2-3 minutes vs expected ~1:40)
  • The Docker daemon was being blocked during volume removal operations, preventing other commands from executing
  • With 128 parallel jobs using docker-in-docker with complex layered volumes, cleanup operations were creating a bottleneck

Testing showed that disabling volume removal reduced average job time from 2-3 minutes to 1 minute 40 seconds with minimal variation.

What's the best way to test this MR?

  1. Set up a high-concurrency pipeline (e.g., 128 parallel jobs) using docker-in-docker
  2. Configure the runner with volume_keep = true in the Docker executor configuration
  3. Run the pipeline and measure job completion times
  4. Verify that volumes persist after job completion using docker volume ls
  5. Implement an asynchronous cleanup script (example provided in the work item) to remove old volumes via cron without blocking the Docker daemon

What are the relevant issue numbers?

Job failing with "set volume permissions" (#38679)

Edited by Ashvin Sharma

Merge request reports

Loading