Autoscaling AWS EC2 does not work due to docker installing issue
Summary
Today our Gitlab Runner with AWS EC2 autoscaling (docker-machine) randomly stopped working and I'm still searching for solutions why it fails to run.
I already tried:
- Updating all software parts to the newest version
- Recreating docker-machine certs
- Reboot the system
But nothing solved the issues. I still can't successfully start the docker machines (they start on AWS and I can also connect to them while they boot via docker-machine ssh <name>, but they get killed as they show Unable to query docker version: Cannot connect to the docker engine endpoint when I run docker-machine ls)
Versions:
- Docker: Docker version 20.10.14, build a224086
- Docker Machine: docker-machine version 0.16.2-gitlab.13, build dac65f58
- Gitlab Runner: 14.10.0 c6bb62f6
Error log with debug enabled
Apr 26 17:52:13 ip-172-31-39-74 gitlab-runner[19337]: Requeued the runner builds=1 runner=rtB9T4xx
Apr 26 17:52:13 ip-172-31-39-74 gitlab-runner[19337]: Running with gitlab-runner 14.10.0 (c6bb62f6) job=2379908834 project=3139350 runner=rtB9T4xx
Apr 26 17:52:13 ip-172-31-39-74 gitlab-runner[19337]: on autoscaling-mycompany rtB9T4xx job=2379908834 project=3139350 runner=rtB9T4xx
Apr 26 17:52:13 ip-172-31-39-74 gitlab-runner[19337]: Preparing the "docker+machine" executor job=2379908834 project=3139350 runner=rtB9T4xx
Apr 26 17:52:13 ip-172-31-39-74 gitlab-runner[19337]: Executing /usr/local/bin/docker-machine [docker-machine --bugsnag-api-token=no-report create --driver amazonec2 --amazonec2-request-spot-instance=false --amazonec2-access-key=<redacted> --amazonec2-secret-key=<redacted> --amazonec2-region=eu-central-1 --amazonec2-vpc-id=vpc-<redacted> --amazonec2-subnet-id=subnet-<redacted> --amazonec2-zone=b --amazonec2-use-private-address=true --amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true --amazonec2-security-group=gitlab-aws-runner --amazonec2-instance-type=t3.large --amazonec2-root-size=80 --engine-registry-mirror=http://172.31.39.74:6000 runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f]
Apr 26 17:52:14 ip-172-31-39-74 gitlab-runner[19337]: Running pre-create checks... driver=amazonec2 name=runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f operation=create
Apr 26 17:52:14 ip-172-31-39-74 gitlab-runner[19337]: Creating machine... driver=amazonec2 name=runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f operation=create
Apr 26 17:52:14 ip-172-31-39-74 gitlab-runner[19337]: (runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f) Launching instance... driver=amazonec2 name=runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f operation=create
Apr 26 17:52:14 ip-172-31-39-74 gitlab-runner[19337]: Checking for jobs... nothing runner=rtB9T4xx
[...]
Apr 26 17:52:16 ip-172-31-39-74 gitlab-runner[19337]: Dialing: tcp gitlab.com:443 ...
Apr 26 17:52:17 ip-172-31-39-74 gitlab-runner[19337]: Appending trace to coordinator... ok code=202 job=2379908834 job-log=0-206 job-status=running runner=rtB9T4xx sent-log=0-205 status=202 Accepted update-interval=3s
[...]
Apr 26 17:52:22 ip-172-31-39-74 gitlab-runner[19337]: Waiting for machine to be running, this may take a few minutes... driver=amazonec2 name=runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f operation=create
Apr 26 17:52:22 ip-172-31-39-74 gitlab-runner[19337]: Detecting operating system of created instance... driver=amazonec2 name=runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f operation=create
Apr 26 17:52:22 ip-172-31-39-74 gitlab-runner[19337]: Waiting for SSH to be available... driver=amazonec2 name=runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f operation=create
[...]
Apr 26 17:52:41 ip-172-31-39-74 gitlab-runner[19337]: Detecting the provisioner... driver=amazonec2 name=runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f operation=create
Apr 26 17:52:41 ip-172-31-39-74 gitlab-runner[19337]: Provisioning with ubuntu(systemd)... driver=amazonec2 name=runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f operation=create
[...]
Apr 26 17:52:54 ip-172-31-39-74 gitlab-runner[19337]: Installing Docker... driver=amazonec2 name=runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f operation=create
[...]
Apr 26 17:53:15 ip-172-31-39-74 gitlab-runner[19337]: ERROR: Error creating machine: Error running provisioning: error installing docker: driver=amazonec2 name=runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f operation=create
Apr 26 17:53:15 ip-172-31-39-74 gitlab-runner[19337]: ERROR: Machine creation failed error=exit status 1 name=runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f time=1m1.861464233s
Apr 26 17:53:15 ip-172-31-39-74 gitlab-runner[19337]: WARNING: Requesting machine removal lifetime=1m1.861688723s name=runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f now=2022-04-26 17:53:15.786350131 +0000 UTC m=+174.028766006 reason=Failed to create used=1m1.861689155s usedCount=0
Apr 26 17:53:15 ip-172-31-39-74 gitlab-runner[19337]: WARNING: Stopping machine lifetime=1m1.934200501s name=runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f reason=Failed to create used=72.474199ms usedCount=0
Apr 26 17:53:15 ip-172-31-39-74 gitlab-runner[19337]: Stopping "runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f"... name=runner-rtb9t4xx-gitlab-docker-machine-1650995533-89265a3f operation=stop
[...]
Apr 26 17:53:16 ip-172-31-39-74 gitlab-runner[19337]: Executing /usr/local/bin/docker-machine [docker-machine --bugsnag-api-token=no-report create --driver amazonec2 --amazonec2-request-spot-instance=false --amazonec2-access-key=xxx --amazonec2-secret-key=xxx --amazonec2-region=eu-central-1 --amazonec2-vpc-id=vpc-0ff1a867 --amazonec2-subnet-id=subnet-88230ef2 --amazonec2-zone=b --amazonec2-use-private-address=true --amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true --amazonec2-security-group=gitlab-aws-runner --amazonec2-instance-type=t3.large --amazonec2-root-size=80 --engine-registry-mirror=http://172.31.39.74:6000 runner-rtb9t4xx-gitlab-docker-machine-1650995596-ebf056f1]
Apr 26 17:53:16 ip-172-31-39-74 gitlab-runner[19337]: Running pre-create checks... driver=amazonec2 name=runner-rtb9t4xx-gitlab-docker-machine-1650995596-ebf056f1 operation=create
Apr 26 17:53:16 ip-172-31-39-74 gitlab-runner[19337]: Creating machine... driver=amazonec2 name=runner-rtb9t4xx-gitlab-docker-machine-1650995596-ebf056f1 operation=create
Apr 26 17:53:17 ip-172-31-39-74 gitlab-runner[19337]: (runner-rtb9t4xx-gitlab-docker-machine-1650995596-ebf056f1) Launching instance... driver=amazonec2 name=runner-rtb9t4xx-gitlab-docker-machine-1650995596-ebf056f1 operation=create
Apr 26 17:53:17 ip-172-31-39-74 gitlab-runner[19337]: Submitting job to coordinator... ok code=200 job=2379908834 job-status= runner=rtB9T4xx update-interval=0s
config.toml
log_level = "debug"
concurrent = 15
check_interval = 0
listen_address = "172.31.39.74:9252"
[session_server]
listen_address = "[::]:8093" # listen on all available interfaces on port 8093
advertise_address = "xxx"
session_timeout = 3600
[[runners]]
name = "autoscaling-mycompany"
limit = 15
url = "https://gitlab.com/"
token = "xxx"
executor = "docker+machine"
[runners.docker]
tls_verify = false
image = "alpine"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = true
shm_size = 0
wait_for_services_timeout=120
[runners.cache]
Type = "s3"
Path = "cache"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
AccessKey = "xxx"
SecretKey = "xxx"
BucketName = "xxx"
BucketLocation = "eu-central-1"
[runners.cache.gcs]
[runners.machine]
IdleCount = 0
IdleTime = 900
MaxBuilds = 15
MachineDriver = "amazonec2"
MachineName = "gitlab-docker-machine-%s"
MachineOptions = [
"amazonec2-request-spot-instance=false",
"amazonec2-access-key=xxx",
"amazonec2-secret-key=xxx",
"amazonec2-region=eu-central-1",
"amazonec2-vpc-id=vpc-xxx",
"amazonec2-subnet-id=subnet-xxx",
"amazonec2-zone=b",
"amazonec2-use-private-address=true",
"amazonec2-tags=runner-manager-name,gitlab-aws-autoscaler,gitlab,true,gitlab-runner-autoscale,true",
"amazonec2-security-group=gitlab-aws-runner",
"amazonec2-instance-type=t3.large",
"amazonec2-root-size=80",
"engine-registry-mirror=http://172.31.39.74:6000"
]
OffPeakTimezone = ""
OffPeakIdleCount = 0
OffPeakIdleTime = 0
Thanks in advance
Edited by Max Fehmerling