Skip to content

Docker autoscaler error on Ubuntu

Sometimes, seemingly randomly, job fails with ERROR: Job failed (system failure): prepare environment: Cannot connect to the Docker daemon at http://internal.tunnel.invalid. Is the docker daemon running? (docker.go:687:120s). Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information.

Things I tried:

  • increase wait_for_services_timeout
  • add startup delay of 2 as proposed in: #38 (closed)
  • set instance_ready_command

Current config:

[runners.docker]
    image = "ruby:3.1"
    gpus = "all"
    volumes = ["/opt/dlami/nvme:/opt/dlami/nvme"]
    shm_size = 2073741824
    wait_for_services_timeout = 360
    privileged = true

  # Autoscaler config
  [runners.autoscaler]
    plugin = "aws" # in GitLab 16.11 and later, ensure you run `gitlab-runner fleeting install>

    # in GitLab 16.10 and earlier, manually install the plugin and use:
    # plugin = "fleeting-plugin-aws"

    capacity_per_instance = 1
    max_use_count = 99
    max_instances = 4
    instance_ready_command = "cloud-init status --wait"
    delete_instances_on_shutdown = true

    [runners.autoscaler.plugin_config] # plugin specific configuration (see plugin documentati>
      name             = "XXX"                # AWS Autoscaling Group name
      profile          = "XXX"                     # optional, default is 'default'
      config_file      = "XXX"      # optional, default is '~/.aws/config'
      credentials_file = "XXX" # optional, default is '~/.aws/creden>
      region = "eu-central-1"

    [runners.autoscaler.connector_config]
      username          = "XXX"
      use_external_addr = true
      key_path = "XXX"
      protocol = "ssh"

    [[runners.autoscaler.policy]]
      idle_count = 0
      idle_time = "0m30s"
Edited by Alexander Court
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information