Occasional macos job failures timing out on "Dialing nesting daemon"

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

In the saas-macos-medium-m1-basic-tests project I have a scheduled pipeline running every other hour.

We are getting an occasional job failures, like so:

This is impacting only 1% of all jobs currently

All of these failures are in the form of:

Running with gitlab-runner 17.7.0~pre.103.g896916a8 (896916a8)
  on blue-2.saas-macos-medium-m1.runners-manager.gitlab.com/default tCo7q1eW, system ID: s_f10a77c3f6e9
Resolving secrets
Preparing the "instance" executor 01:00:00
Preparing instance...
Dialing instance i-06ba8edc53bb73918...
Instance i-06ba8edc53bb73918 connected
Enforcing VM Isolation
Creating nesting VM tunnel
Creating nesting VM macos-14-xcode-15
Dialing nesting daemon
ERROR: Job failed: execution took longer than 1h0m0s seconds

These start to build up on the macos instances that have been active the most and for the longest time period. This is a host degradation issue.

Most likely we can detect this repetitive failure and delete the host as suggested in Track instances health and remove unhealthy aft... (gitlab-org/fleeting/taskscaler#9)

Edited by 🤖 GitLab Bot 🤖