docker+machine executor fails sometimes in "preparing environment"
Summary
docker+machine runner failure, maybe once a week.
Steps to reproduce
This appears to be a sporadic failure. I'm open to suggestions for additional instrumentation or logging that would help track down the issue
Actual behavior
Occasionally we will see
ERROR: Job failed (system failure): prepare environment: Cannot connect to the Docker daemon at tcp://xx.xx.xx.xx:2376. Is the docker daemon running? (docker.go:705:120s). Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
The IP address given appears to be of the VM that is started and stopped by the docker+machine runner, as it does not match the IP address of the runner itself.
Expected behavior
The runner should not crash
Environment description
This uses a docker+machine runner in AWS
config.toml contents
concurrent = 8
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
limit = 4
name = "aws-vm-autoscaling-runner"
url = "https://gitlab.com/"
token = "xxxxxxx-xxxxxxxxxxxx"
executor = "docker+machine"
environment = ["DOCKER_AUTH_CONFIG={\"credHelpers\":{\"xxxx.dkr.ecr.us-west-2.amazonaws.com\":\"ecr-login\"}}"]
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.docker]
tls_verify = false
image = "docker"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = [
"/cache",
"/var/run/docker.sock:/var/run/docker.sock",
"/home/ubuntu/.docker:/root/.docker",
"/usr/local/bin/docker-credential-ecr-login:/usr/local/bin/docker-credential-ecr-login"
]
shm_size = 0
[runners.machine]
IdleCount = 1
IdleTime = 2400
OffPeakPeriods = [
"* * 0-8,19-23 * * mon-fri *",
"* * * * * sat,sun *"
]
OffPeakTimezone = "America/Los_Angeles"
OffPeakIdleCount = 0
OffPeakIdleTime = 600
MachineDriver = "amazonec2"
MachineName = "gitlab-ci-autoscale-%s"
MachineOptions = [
"amazonec2-access-key=xxxxx",
"amazonec2-secret-key=xxxxx",
"amazonec2-instance-type=m4.2xlarge",
"amazonec2-region=us-west-2",
"amazonec2-vpc-id=vpc-xxxxx",
"amazonec2-iam-instance-profile=GitLabCI",
"amazonec2-ami=ami-xxxxx",
"amazonec2-root-size=32",
"amazonec2-tags=project,ci"
]
Docker version:
Client:
Version: 18.09.7
API version: 1.39
Go version: go1.10.1
Git commit: 2d0083d
Built: Fri Aug 16 14:20:06 2019
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 18.09.7
API version: 1.39 (minimum version 1.12)
Go version: go1.10.1
Git commit: 2d0083d
Built: Wed Aug 14 19:41:23 2019
OS/Arch: linux/amd64
Experimental: false
Used GitLab Runner version
Version: 13.1.0
Git revision: 6214287e
Git branch: 13-1-stable
GO version: go1.13.8
Built: 2020-06-19T21:12:22+0000
OS/Arch: linux/amd64
We were also seeing the same issue on an older version of the gitlab runner - I updated the runner a few weeks ago to see if it would resolve the problem. It has not.
Edited by Peter Baughman