GitLab Runner Autoscaling - Feedback issue for the new runner autoscaling solution
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary of Feedback on GitLab Runner Autoscaling (GitLab Duo)
General Feedback
- Users are generally positive about the new autoscaling solution, with one customer noting they "solved all the lingering issues we had with docker-machine by switching to Fleeting"
- The architecture allows for running ARM/Graviton hosts with an Intel x86 orchestrator, which was praised
- Users appreciate the ability to run Windows fleeting nodes with a Linux orchestrator
Configuration Issues
- AWS Region Configuration: Several users reported issues with missing AWS region configuration, requiring manual creation of config files
-
Docker TLS Verification: When
tls_verify = true
, users encountered connection errors - Resource Allocation: Questions about how to determine CPU, memory, and storage for containers to ensure consistent performance
- Docker Registry Mirrors: Users needed guidance on configuring registry mirrors to avoid Docker Hub throttling
-
Storage Options: Clarification needed on configuring storage limits via
storage-opt
andvolume_driver_ops
Performance Issues
- SSH Key Generation: Dynamic SSH key generation causes significant delays (30-35 seconds) compared to static keys (2-3 seconds)
- Warm Pools: Users reported significant differences in instance provisioning speed with warm pools vs. without
- Job Queue Delays: Some users experienced high job pending queues despite not reaching max concurrency
- Scaling Speed: With warm pools, it took 7 minutes to scale from 3 to 32 instances, while without warm pools it took only 1 minute to spawn 36 instances
Stability Issues
- Instance Termination: Instances sometimes get terminated while still running jobs
- Connection Errors: Users reported various connection errors like "EC2 Instance Connect is not supported on a terminated instance"
-
Heartbeat Checks: The feature flag
FF_USE_FLEETING_ACQUIRE_HEARTBEATS
was introduced to help with instance connectivity issues -
ASG Rebalancing: Disabling
AZRebalance
process on ASGs resolved issues with jobs hanging - API Rate Limiting: Excessive AWS API calls can lead to rate limiting, causing service disruptions
Feature Requests
- AWS Warm Pools Support: Better integration with AWS warm pools to reduce startup times
- Multiple Runners per ASG: Support for sharing the same Auto Scaling Group across different runners
- Disk Space Management: Better handling of "No Space Left on Device" errors
- Graceful Shutdown: Improved handling of SIGTERM for proper cleanup
- State Persistence: Ability to persist state between processes for rolling deployments
- Throttling Control: More control over API call frequency to avoid rate limiting
- Error Handling: Better detection when scaling fails to prevent continuous failed attempts
Cloud Provider Specific Feedback
- AWS: Most feedback was related to AWS, with specific issues around spot instances, ASG configuration, and API limits
- Azure: Issues with WinRM configuration on Windows VMs and VMSS scaling problems when hitting resource quotas
- GCP: Some users reported that instance groups only scale down but not up
Documentation Needs
- Better documentation on ASG setup requirements (like disabling AZRebalance)
- Clearer explanation of configuration parameters and their interactions
- More examples for different cloud providers
- Documentation on pros/cons of dynamic vs. static SSH key usage
The feedback shows that while the new autoscaling solution offers significant improvements over docker-machine, there are still areas that need refinement, particularly around stability, performance, and documentation.
Sources: GitLab Runner Autoscaling - Feedback issue for the new runner autoscaling solution
Background
As of GitLab Runner 15.11, the new Docker Autoscaler, Instance executor, and fleeting plugin for AWS is available.
The technical details for this new autoscaling solution are documented in the Next Runner autoscaling architecture blueprint.
Please add comments below with your feedback or questions:.
Edited by 🤖 GitLab Bot 🤖