Add slot-based cgroup support for Docker executor (!5870) · Merge requests · GitLab.org / gitlab-runner

What does this MR do?

Add slot-based cgroup support for Docker executor

Add configuration options and implementation to use taskscaler slot numbers for dynamic cgroup naming, enabling persistent resource pools per slot.

Add UseSlotCgroups, SlotCgroupTemplate, ServiceSlotCgroupTemplate config fields
Implement getCgroupParent() and getServiceCgroupParent() with slot resolution
Add GetAcquisition() method to autoscaler AcquisitionRef for slot access
Update createHostConfig methods to use dynamic cgroup resolution
Add unit tests for all slot cgroup functionality

Why was this MR needed?

Enables persistent resource isolation by placing all containers for a job (build and services) into the same slot-derived cgroup. This allows administrators to pre-create cgroups for each slot, providing consistent resource allocation and isolation for entire jobs across executions.

What's the best way to test this MR?

This MR has unit tests. But I also ran a load test with the static fleeting plugin to verify I could constrain a job to a single slots worth of resources.

Test Environment

VM: 8 CPUs, 31GB RAM
Target: 4 slots with 2 CPUs each
GitLab Runner: docker-autoscaler executor with slot-based cgroup feature

Cgroup Configuration

Created systemd slices with CPU affinity and resource limits:

# Slot assignments
sudo systemctl set-property --runtime runner-slot-0.slice CPUQuota=200% AllowedCPUs=0,1
sudo systemctl set-property --runtime runner-slot-1.slice CPUQuota=200% AllowedCPUs=2,3
sudo systemctl set-property --runtime runner-slot-2.slice CPUQuota=200% AllowedCPUs=4,5
sudo systemctl set-property --runtime runner-slot-3.slice CPUQuota=200% AllowedCPUs=6,7

# Memory limits (2GB per slot)
for i in {0..3}; do
    sudo systemctl set-property --runtime runner-slot-$i.slice MemoryMax=2G
done

GitLab Runner Configuration

[runners.docker]
  use_slot_cgroups = true
  slot_cgroup_template = "runner-slot-{slot}.slice"
  service_slot_cgroup_template = "runner-slot-{slot}.slice"

[runners.autoscaler]
  plugin = "fleeting-plugin-static"
  capacity_per_instance = 4  # 4 slots per VM
  max_instances = 1          # Single VM with 4 slots

  [runners.autoscaler.plugin_config]
    path = "/path/to/instances.json"

Static Plugin Instance Configuration

{
    "cgroup-slot-test": {
        "os": "linux",
        "arch": "amd64",
        "protocol": "ssh",
        "username": "josephburnett",
        "key_path": "/home/josephburnett/.ssh/id_ed25519",
        "internal_addr": "10.128.0.33"
    }
}

CPU Burn Test

Used GitLab CI job that attempts to use 8 CPUs for 15 seconds:

job:
  script: for i in {1..8}; do timeout 15s yes > /dev/null & done ; wait

# Monitor CPU usage during job execution
htop

Results:

Without cgroup: Used all 8 CPUs
In slot-0 cgroup: Limited to CPUs 0,1 only

What are the relevant issue numbers?

N/A

Edited Oct 17, 2025 by Joe Burnett