Allow cgroup parent to be configured for docker executor
Description
Allow the cgroup parent to be configured for docker executor.
Problem
Customers cannot easily configure fine-tuned resource control for GitLab jobs on shared docker hosts. Cgroup configurations solve this problem, but this configuration option (--cgroup-parent
) is not available for the gitlab runner docker executor.
For example, suppose you have a host with 32GB of physical memory. Job demand for memory can vary widely from just a few MB to multiple GB. To make best use of system resources, the runner has concurrency enabled. However, customers will face challenges in ensuring proper memory limits for jobs.
If the memory limit is too high, the sum of memory usage among all jobs may be greater than the physical memory available on the system host. This can cause essential services (including docker itself) to fail due to OOM issues, which can be fatal to the GitLab runner itself, or other system agents such as AWS ecs agent.
If the memory limit is too low, jobs may be throttled or OOM-killed even though there is available physical memory on the host, resulting in poor resource allocation because resource limits are placed per-container, rather than as a shared resource group.
Proposal
The --cgroup-parent
allows shared resource groups to be configured. This is an indispensable capability for runners on shared docker hosts in order to allow optimal distribution of host resources and ensuring proper constraints on resource usage.
cgroups allow for more fine-grained control than options exposed directly though docker (or even more limited runner options) and allows those resource controls to be applied to all descendants within the group (for example, containers launched by the runner).
For example, to solve the problem scenario above, GitLab customers will be able to specify memory limits for the entire cgroup shared by all jobs without segmenting host memory on a per-job basis. This way, customers can set a maximum amount of memory to be available to all jobs without setting hard limits on individual jobs. This can help ensure system stability while allowing fluid distribution of available system resources.
This, among other configurations available via cgroups.
Workarounds that are not acceptable
Use the docker daemon configuration
It is possible to configure default cgroup parent on the docker daemon configuration. However, this is unacceptable because:
- Customers may have existing workloads that run and need to not exist within the cgroup parent (e.g. aws ecs-agent)
- Customers want to configure multiple cgroups (e.g. multiple tiers of runners with different constraints, or other purposes entirely)
- Customers may use the default daemon configuration for other purposes