Occasional macos job failures timing out on "Dialing nesting daemon"

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Close this issue

In the saas-macos-medium-m1-basic-tests project I have a scheduled pipeline running every other hour.

We are getting an occasional job failures, like so:

This is impacting only 1% of all jobs currently

All of these failures are in the form of:

Running with gitlab-runner 17.7.0~pre.103.g896916a8 (896916a8)
  on blue-2.saas-macos-medium-m1.runners-manager.gitlab.com/default tCo7q1eW, system ID: s_f10a77c3f6e9
Resolving secrets
Preparing the "instance" executor 01:00:00
Preparing instance...
Dialing instance i-06ba8edc53bb73918...
Instance i-06ba8edc53bb73918 connected
Enforcing VM Isolation
Creating nesting VM tunnel
Creating nesting VM macos-14-xcode-15
Dialing nesting daemon
ERROR: Job failed: execution took longer than 1h0m0s seconds

These start to build up on the macos instances that have been active the most and for the longest time period. This is a host degradation issue.

Most likely we can detect this repetitive failure and delete the host as suggested in Track instances health and remove unhealthy aft... (gitlab-org/fleeting/taskscaler#9)

Edited Jun 18, 2025 by 🤖 GitLab Bot 🤖