Rough MacOS on AWS Setup Instructions
The scope of this issue to capture the required steps to setup MacOS running on AWS with the new Taskscaler architecture. Non-requirements are the actual setup or publishing this documentation as a blueprint (yet). This is to make a path for anyone wanting to experiment with MacOS on AWS and as a learning exercise. When this issue accurately captures the current state, another issue will be created to reflect these steps in an updated blueprint for autoscaling MacOS.
This setup uses nested virtualization in order to limit VM usage to a single job (multi-tenant setup). AWS Macs must run on dedicated hosts. And they take about 40 minutes to tear down (scrubbing) and 40 minutes to fully start up. So consuming one instance per job is infeasible. So we keep instances around and start nested VMs using tart
(a layer over Apple's virtualization framework). Those nested VMs are one per job. This is orchestrated by runner through a client/server software called nesting
.
As of Dec 5, 2022 adding nesting to runner is still just a merge request: !3654 (merged) so you'll have to check it out locally.
Setting up MacOS on AWS
Create AWS environment
Create an AWS autoscaling group (ASG) for runner to use.
Create dedicated hosts
Create a few mac2.metal dedicated hosts. Select "Auto-placement" so your instances will automatically be scheduled onto these hosts.
Note: These take 24 hours to release.
Create a VPC
Create a new VPC using the "VPC and more" dialog. It will create a VPC, Subnets, Route tables and an Internet Gateway. Defaults are fine as long as the right zones are selected. No NAT gateway because your instances will have public IPs.
On the created Network ACL, create an inbound rule to allow SSH traffic.
And create an outbound rule to allow all traffic.
And associate all subnets with the VPC.
For each subnet create a route to the Internet Gateway.
Note: be sure to enable auto-assignment of a public IP address in each subnet so Taskscaler can connect to the instances.
Copy the instance image into your AWS account
The ASG will create instances from an image which includes nesting and tart, tools for starting nested VMs, as well as a few VM images. Creating these images is out of scope for these instructions.
From the group sandbox account (915502504722), give your account permission to access ami-00d682a3997cf402f
and the underlying snapshot snap-0a8ef4526793fdbae
from eng-dev-verify-runner
(Ireland region).
From your sandbox account, copy the AMI to the region where you have created the dedicated hosts.
Create an ASG
Create an ASG in your VPC. Use the AMI you copied as your launch template. Choose a minimum capacity of 0 and maximum of the number of dedicated hosts you provisioned. Do not enable autoscaling because Taskscaler will take care of that for you.
Create a new SSH key or use an existing one. You will need it later for the runner.
Configure runner
Configure your runner as follows.
Note: this configuration requires merge request !3654 (merged) which you can check out locally.
concurrent = 4
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "local-taskrunner"
url = "https://gitlab.com/"
token = "REDACTED"
executor = "instance"
[runners.instance]
allowed_images = ["*"]
[runners.autoscaler]
capacity_per_instance = 2
max_use_count = 0
max_instances = 2
plugin = "fleeting-plugin-aws"
[runners.autoscaler.plugin_config]
name = "mac2"
region = "us-west-2"
[runners.autoscaler.connector_config]
username = "ec2-user"
key_path = "/Users/josephburnett/.ssh/aws-mac.pem"
timeout = "1h"
[[runners.autoscaler.policy]]
idle_count = 2
idle_time = "24h"
[runners.autoscaler.vm_isolation]
enabled = true
nesting_host = "unix:///Users/ec2-user/Library/Application Support/nesting.sock"
image = "macos-12-xcode-14"
[runners.autoscaler.vm_isolation.connector_config]
username = "gitlab"
password = "gitlab"
timeout = "1h"
Note: be sure to build the fleeting-plugin-aws
binary and have it available on your path per https://gitlab.com/gitlab-org/fleeting/fleeting-plugin-aws/-/blob/main/README.md.
The key_path
entry is the SSH key you associated with the launch template.
Start runner
Start the runner and it will start health checking the instances. Once all the ASG instances are scheduled on the Dedicate Hosts and they have come up healthy (look for 2/2 checks) then runner should be able to connect and start nested VMs. Try running a job!