Infrequent job failures ("error during connect") with docker-autoscaler and AWS fleeting plugin
Summary
We are trying out the experimental fleeting support on AWS with GitLab Runner 16.5. Quite often (more than once every 10 jobs) in the middle of the script we're seeing an error.
Steps to reproduce
It happens seemingly randomly, across all the jobs in all the repositories. I haven't been able to see any pattern yet.
Actual behavior
Jobs randomly fail, with nearly identical error messages (see below).
Expected behavior
The jobs don't fail :)
Relevant logs and/or screenshots
section_end:1699277419:step_script
[0Ksection_start:1699277419:cleanup_file_variables
[0K[0K[36;1mCleaning up project directory and file based variables[0;m[0;m
[0;33mWARNING: Failed to inspect predefined container 01173e3c3a77515196f1f9726205e28878c4ceafe01c687e1fce2bafcc7b4c63 error during connect: Get "http://internel.tunnel.invalid/v1.43/containers/01173e3c3a77515196f1f9726205e28878c4ceafe01c687e1fce2bafcc7b4c63/json": dialing environment connection: EOF (docker_command.go:134:0s)[0;m
[0KUsing helper image: registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-853330f9[0;m
[0KPulling docker image registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-853330f9 ...[0;m
[0;33mWARNING: Failed to pull image with policy "always": error during connect: Post "http://internel.tunnel.invalid/v1.43/images/create?fromImage=registry.gitlab.com%2Fgitlab-org%2Fgitlab-runner%2Fgitlab-runner-helper&tag=x86_64-853330f9": dialing environment connection: EOF (manager.go:237:0s)[0;m
section_end:1699277419:cleanup_file_variables
[0K[31;1mERROR: Failed to cleanup volumes[0;m
[31;1mERROR: Job failed (system failure): error during connect: Post "http://internel.tunnel.invalid/v1.43/containers/748a7e651c94ecf42d894f0e8b6bbe5f1d3e45b7e4deea36bff3a47a12719c59/wait?condition=not-running": dialing environment connection: EOF[0;m
Environment description
docker info output
Client:
Version: 24.0.5
Context: default
Debug Mode: false
Server:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 1
Server Version: 24.0.5
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version:
runc version:
init version:
Security Options:
apparmor
seccomp
Profile: builtin
Kernel Version: 5.4.0-166-generic
Operating System: Ubuntu 20.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.925GiB
Name: trevor
ID: 7ce1107e-baff-4b96-a3c6-846b937a96de
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
https://docker-mirror.active-group.de/
Live Restore Enabled: false
WARNING: No swap limit support
config.toml contents
concurrent = 20
check_interval = 2
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "ec2-docker-autoscaler"
limit = 10
url = "https://gitlab.active-group.de"
id = 16
token = "FOOBAR"
token_obtained_at = 2023-07-07T14:14:03Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "docker-autoscaler"
environment = ["DOCKER_TLS_CERTDIR="]
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = true
shm_size = 0
[runners.autoscaler]
capacity_per_instance = 1
max_use_count = 1
max_instances = 10
plugin = "/etc/gitlab-runner/fleeting-plugin-aws-linux-amd64"
[runners.autoscaler.plugin_config]
credentials_file = "/etc/gitlab-runner/aws-credentials"
config_file = "/etc/gitlab-runner/aws-config"
name = "gitlab-autoscaling-group"
[runners.autoscaler.connector_config]
username = "ubuntu"
use_static_credentials = false
use_external_addr = true
Used GitLab Runner version
Version: 16.5.0
Git revision: 853330f9
Git branch: 16-5-stable
GO version: go1.20.10
Built: 2023-10-20T15:57:21+0000
OS/Arch: linux/amd64