Failed to run jobs on GitLab Runner with docker-autoscaler in Azure

Summary

Hi folks! I'm trying to configure a GitLab Runner instance with the docker-autoscaler executor with the fleeting plugin using an Azure Scale Set.

When the runner starts, it would update the Scale Set (I can see new instances starting up), but then this error comes up and the job won't even start:

ERROR: Job failed (system failure): Cannot connect to the Docker daemon at http://internal.tunnel.invalid. Is the docker daemon running? (docker.go:951:0s)

Steps to reproduce

I've created a managed Azure VM image based on a VM with: Ubuntu 22.04 LTS with Docked installed, added the azureuser to the docker group and configured Docker to start on boot with systemctl.

Then created the Scale Set and configured my runner.

Actual behavior

The VM is created but an error is shown:

ERROR: Job failed (system failure): Cannot connect to the Docker daemon at http://internal.tunnel.invalid. Is the docker daemon running? (docker.go:951:0s)

Expected behavior

Job finished OK.

Relevant logs and/or screenshots

job log
Preparing the "docker-autoscaler" executor
00:20
Dialing instance 3...
Instance 3 connected
ERROR: Failed to remove network for build
ERROR: Preparation failed: Cannot connect to the Docker daemon at http://internal.tunnel.invalid. Is the docker daemon running? (docker.go:951:10s)
Will be retried in 3s ...

Environment description

config.toml contents
concurrent = 1
check_interval = 0
connection_max_age = "15m0s"
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "My Runner"
  url = "https://gitlab.com"
  id = 123123123
  token = "glrt-Some-Token"
  token_obtained_at = 2024-03-28T14:02:04Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "docker-autoscaler"

  [runners.cache]
    MaxUploadedArchiveSize = 0

  [runners.docker]
    tls_verify = false
    image = "docker:stable"
    privileged = false
    disable_entrypoint_overwrite = false
    oom_kill_disable = false
    disable_cache = false
    volumes = ["/cache", "/certs/client"]
    shm_size = 0
    network_mtu = 0
    services_limit = -1

  # Autoscaler config
  [runners.autoscaler]
    plugin = "azure" # for >= 16.11, ensure you run `gitlab-runner fleeting install` to automatically install the plugin

    # for versions < 17.0, manually install the plugin and use:
    # plugin = "fleeting-plugin-azure"

    capacity_per_instance = 1
    max_use_count = 1
    max_instances = 10

    [runners.autoscaler.plugin_config] # plugin specific configuration (see plugin documentation)
      name = "gitlab-runner-scale-set"
      subscription_id = "SUBSCRIPTION_ID"
      resource_group_name = "gitlab-runner"

    [runners.autoscaler.connector_config]
      username = "azureuser"
      password = "SOME-PASSWORD"
      use_static_credentials = true
      timeout = "10m"
      use_external_addr = true

    [[runners.autoscaler.policy]]
      idle_count = 1
      idle_time = "10m0s"

Used GitLab Runner version

- Ubuntu 22.04 LTS
- Docker version 26.1.3, build b72abbb
- GitLab Runner Version:      17.0.0
- Git revision: 44feccdf
- Git branch:   17-0-stable
- GO version:   go1.21.9
- Built:        2024-05-16T13:46:14+0000
- OS/Arch:      linux/amd64

Possible fixes