MacOS Nesting Autoscaling SSH connection error with 16.11+
Summary
After upgrading from 16.10 to 16.11 of the gitlab runner our MacOS Fleeting autoscaling no longer worked. We would get the following error whenever fleeting attempted to connect to our tart VMs, ERROR: Preparation failed: creating instance environment: creating vm tunnel: dialing nesting vm: dial ssh: after retrying 0 times: setting ssh dial deadline: ssh: tcpChan: deadline not supported. Once we downgraded back to 16.10, the issue went away and our VMs worked fine.
I have tried upgrading all the way to the latest 17.x version of the runners and the issue remains.
Steps to reproduce
Upgrade to 16.11.1+, spin up a MacOS nesting VM runner.
.gitlab-ci.yml
test:
stage: test
image: macos-14-xcode-15
tags:
- mac
- runner
script: echo "test"
Actual behavior
Running with gitlab-runner 16.11.1 (fe451d5a)
on mac WTHtNKVWm, system ID: s_0a25e0673458
feature flags: FF_USE_FASTZIP:true
Resolving secrets
Preparing the "instance" executor 00:53
Preparing instance...
Dialing instance i-0da5c08e56a969ff4...
Instance i-0da5c08e56a969ff4 connected
Enforcing VM Isolation
Creating nesting VM tunnel
Creating nesting VM macos-14-xcode-15
Dialing nesting daemon
Created nesting VM nesting-3ej32x82 192.168.64.3
ERROR: Preparation failed: creating instance environment: creating vm tunnel: dialing nesting vm: dial ssh: after retrying 0 times: setting ssh dial deadline: ssh: tcpChan: deadline not supported
Expected behavior
Relevant logs and/or screenshots
job log
Running with gitlab-runner 16.10.0 (81ab07f6)
on mac WTHtNKVWm, system ID: s_255526db87a4
feature flags: FF_USE_FASTZIP:true
Resolving secrets 00:00
Preparing the "instance" executor 04:48
Preparing instance...
Dialing instance i-09ea846a2497aed4e...
Instance i-09ea846a2497aed4e connected
Enforcing VM Isolation
Creating nesting VM tunnel
Dialing nesting daemon
Creating nesting VM macos-14-xcode-15
Created nesting VM nesting-0a8brmdx 192.168.64.3
Preparing environment 00:08
Running on ip-192-168-64-3.ec2.internal via ip-10-101-1-28.ec2.internal...
Getting source from Git repository 00:35
Fetching changes with git depth set to 20...
Initialized empty Git repository in /Users/admin/builds/WTHtNKVWm/0/zachk/test/.git/
Created fresh repository.
failed to store: -25308
Checking out b0948f0c as detached HEAD (ref is main)...
Skipping Git submodules setup
Executing "step_script" stage of the job script 00:01
$ echo "test"
test
Cleaning up project directory and file based variables 00:00
Job succeeded
Environment description
config.toml contents
[[runners]]
name = "MacOS Autoscaler"
executor = "instance"
environment = [
"FF_USE_FASTZIP=true",
"ARTIFACT_COMPRESSION_LEVEL=default",
"CACHE_COMPRESSION_LEVEL=fastest" ,
"TRANSFER_METER_FREQUENCY=2s"
]
[runners.instance]
allowed_images = ["*"] # allow any nesting image
[runners.custom_build_dir]
enabled = true
[runners.autoscaler]
capacity_per_instance = 2 # AppleSilicon can only support 2 VMs per host
max_use_count = 0
plugin = "fleeting-plugin-aws"
delete_instances_on_shutdown = true
[[runners.autoscaler.policy]]
idle_count = 0
idle_time = "24h" # AWS's MacOS instances
[runners.autoscaler.connector_config]
username = "ec2-user"
use_external_addr = false
key_path = ---
timeout = "1h" # connecting to a MacOS instance can take some time, as they can be slow to provision
[runners.autoscaler.plugin_config]
name = ---
region = "us-east-1"
[runners.autoscaler.vm_isolation]
enabled = true
nesting_host = "unix:///Users/ec2-user/Library/Application Support/nesting.sock"
[runners.autoscaler.vm_isolation.connector_config]
username = "admin"
password = ----
timeout = "20m"
Used GitLab Runner version
Possible fixes
Edited by Zack Knight