Fix nesting restarts
For our SaaS MacOS runenrs we're using nesting to handle nested virtualisation of job VMs on the AWS instances. For that we have a custom AMI built on top of Mac AMI distributed by AWS.
Nesting is installed there as a launchd
managed service. However, it may occasionally fail (like with the bug described at gitlab-org/fleeting/nesting!9 (merged)).
From what I understand from this service configuration file:
<key>KeepAlive</key>
<true />
in case of failures the service should be automatically restarted. Unfortunately - it isn't.
When started, nesting creates a socket file, in our case placed at /Users/ec2-user/Library/Application Support/nesting.sock
. If the daemon is not closed properly, that socket is left behind. When launchd restarts the process, it fails to start because net.Listen("unix", socket)
fails with socket already exists
.
We should either update nesting to delete and re-create such socket when starting. Or we should add a wrapper around nesting in our AMI (with a shell script) that would ensure the socket file is deleted if exists at nesting startup - before nesting itself is started.