GitLab Runner Autoscaling - Feedback issue for the new runner autoscaling solution

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Close this issue

Summary of Feedback on GitLab Runner Autoscaling (GitLab Duo)

General Feedback

Users are generally positive about the new autoscaling solution, with one customer noting they "solved all the lingering issues we had with docker-machine by switching to Fleeting"
The architecture allows for running ARM/Graviton hosts with an Intel x86 orchestrator, which was praised
Users appreciate the ability to run Windows fleeting nodes with a Linux orchestrator

Configuration Issues

AWS Region Configuration: Several users reported issues with missing AWS region configuration, requiring manual creation of config files
Docker TLS Verification: When tls_verify = true, users encountered connection errors
Resource Allocation: Questions about how to determine CPU, memory, and storage for containers to ensure consistent performance
Docker Registry Mirrors: Users needed guidance on configuring registry mirrors to avoid Docker Hub throttling
Storage Options: Clarification needed on configuring storage limits via storage-opt and volume_driver_ops

Performance Issues

SSH Key Generation: Dynamic SSH key generation causes significant delays (30-35 seconds) compared to static keys (2-3 seconds)
Warm Pools: Users reported significant differences in instance provisioning speed with warm pools vs. without
Job Queue Delays: Some users experienced high job pending queues despite not reaching max concurrency
Scaling Speed: With warm pools, it took 7 minutes to scale from 3 to 32 instances, while without warm pools it took only 1 minute to spawn 36 instances

Stability Issues

Instance Termination: Instances sometimes get terminated while still running jobs
Connection Errors: Users reported various connection errors like "EC2 Instance Connect is not supported on a terminated instance"
Heartbeat Checks: The feature flag FF_USE_FLEETING_ACQUIRE_HEARTBEATS was introduced to help with instance connectivity issues
ASG Rebalancing: Disabling AZRebalance process on ASGs resolved issues with jobs hanging
API Rate Limiting: Excessive AWS API calls can lead to rate limiting, causing service disruptions

Feature Requests

AWS Warm Pools Support: Better integration with AWS warm pools to reduce startup times
Multiple Runners per ASG: Support for sharing the same Auto Scaling Group across different runners
Disk Space Management: Better handling of "No Space Left on Device" errors
Graceful Shutdown: Improved handling of SIGTERM for proper cleanup
State Persistence: Ability to persist state between processes for rolling deployments
Throttling Control: More control over API call frequency to avoid rate limiting
Error Handling: Better detection when scaling fails to prevent continuous failed attempts

Cloud Provider Specific Feedback

AWS: Most feedback was related to AWS, with specific issues around spot instances, ASG configuration, and API limits
Azure: Issues with WinRM configuration on Windows VMs and VMSS scaling problems when hitting resource quotas
GCP: Some users reported that instance groups only scale down but not up

Documentation Needs

Better documentation on ASG setup requirements (like disabling AZRebalance)
Clearer explanation of configuration parameters and their interactions
More examples for different cloud providers
Documentation on pros/cons of dynamic vs. static SSH key usage

The feedback shows that while the new autoscaling solution offers significant improvements over docker-machine, there are still areas that need refinement, particularly around stability, performance, and documentation.

Sources: GitLab Runner Autoscaling - Feedback issue for the new runner autoscaling solution

Background

As of GitLab Runner 15.11, the new Docker Autoscaler, Instance executor, and fleeting plugin for AWS is available.

The technical details for this new autoscaling solution are documented in the Next Runner autoscaling architecture blueprint.

Please add comments below with your feedback or questions:.

Edited Jun 13, 2025 by 🤖 GitLab Bot 🤖