Docker autoscaler error on Ubuntu
Sometimes, seemingly randomly, job fails with ERROR: Job failed (system failure): prepare environment: Cannot connect to the Docker daemon at http://internal.tunnel.invalid. Is the docker daemon running? (docker.go:687:120s). Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
.
Things I tried:
- increase wait_for_services_timeout
- add startup delay of 2 as proposed in: #38 (closed)
- set instance_ready_command
Current config:
[runners.docker]
image = "ruby:3.1"
gpus = "all"
volumes = ["/opt/dlami/nvme:/opt/dlami/nvme"]
shm_size = 2073741824
wait_for_services_timeout = 360
privileged = true
# Autoscaler config
[runners.autoscaler]
plugin = "aws" # in GitLab 16.11 and later, ensure you run `gitlab-runner fleeting install>
# in GitLab 16.10 and earlier, manually install the plugin and use:
# plugin = "fleeting-plugin-aws"
capacity_per_instance = 1
max_use_count = 99
max_instances = 4
instance_ready_command = "cloud-init status --wait"
delete_instances_on_shutdown = true
[runners.autoscaler.plugin_config] # plugin specific configuration (see plugin documentati>
name = "XXX" # AWS Autoscaling Group name
profile = "XXX" # optional, default is 'default'
config_file = "XXX" # optional, default is '~/.aws/config'
credentials_file = "XXX" # optional, default is '~/.aws/creden>
region = "eu-central-1"
[runners.autoscaler.connector_config]
username = "XXX"
use_external_addr = true
key_path = "XXX"
protocol = "ssh"
[[runners.autoscaler.policy]]
idle_count = 0
idle_time = "0m30s"
Edited by Alexander Court