executor docker+machine randomly fails
I tried DigitalOcean and Google Cloud as docker machine provider and the result is the same, the builds are sometimes successful and other time they just fails silently. I found nothing in the gitlab-runner logs that explain the problem. The VM is well started but it's like the build start himself is failing. See logs (with digitalocean):
Creating machine...
(runner-c0956f6e-auto-scale-runners-1464791742-e8f59816) Creating SSH key...
(runner-c0956f6e-auto-scale-runners-1464791742-e8f59816) Creating Digital Ocean droplet...
(runner-c0956f6e-auto-scale-runners-1464791742-e8f59816) Waiting for IP address to be assigned to the Droplet...
Waiting for machine to be running, this may take a few minutes...
- Grim reaper cleanup: pid=3168, wstatus=0
15145 Submitting build to coordinator... ok runner=c0956f6e
- Grim reaper cleanup: pid=3177, wstatus=0
WARNING: Failed to update executor docker+machine for c0956f6e wait: no child processes
- Grim reaper cleanup: pid=3182, wstatus=0
- Grim reaper cleanup: pid=3196, wstatus=0
- Grim reaper cleanup: pid=3205, wstatus=0
WARNING: Failed to update executor docker+machine for c0956f6e wait: no child processes
- Grim reaper cleanup: pid=3211, wstatus=0
- Grim reaper cleanup: pid=3224, wstatus=0
- Grim reaper cleanup: pid=3240, wstatus=0
- Grim reaper cleanup: pid=3253, wstatus=0
Detecting operating system of created instance...
Waiting for SSH to be available...
- Grim reaper cleanup: pid=3264, wstatus=0
WARNING: Failed to update executor docker+machine for c0956f6e wait: no child processes
- Grim reaper cleanup: pid=3270, wstatus=0
- Grim reaper cleanup: pid=3280, wstatus=0
- Grim reaper cleanup: pid=3285, wstatus=0
WARNING: Failed to update executor docker+machine for c0956f6e wait: no child processes
- Grim reaper cleanup: pid=3294, wstatus=0
WARNING: Failed to update executor docker+machine for c0956f6e wait: no child processes
- Grim reaper cleanup: pid=3300, wstatus=0
15145 Submitting build to coordinator... ok runner=c0956f6e
- Grim reaper cleanup: pid=3308, wstatus=0
- Grim reaper cleanup: pid=3314, wstatus=0
WARNING: Failed to update executor docker+machine for c0956f6e wait: no child processes
- Grim reaper cleanup: pid=3322, wstatus=0
WARNING: Failed to update executor docker+machine for c0956f6e wait: no child processes
- Grim reaper cleanup: pid=3327, wstatus=0
- Grim reaper cleanup: pid=3342, wstatus=0
Detecting the provisioner...
Provisioning with coreOS...
Copying certs to the local machine directory...
- Grim reaper cleanup: pid=3354, wstatus=0
WARNING: Failed to update executor docker+machine for c0956f6e wait: no child processes
- Grim reaper cleanup: pid=3361, wstatus=0
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
- Grim reaper cleanup: pid=3377, wstatus=0
- Grim reaper cleanup: pid=3383, wstatus=0
WARNING: Failed to update executor docker+machine for c0956f6e wait: no child processes
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env runner-c0956f6e-auto-scale-runners-1464791742-e8f59816
- Grim reaper cleanup: pid=3140, wstatus=0
- Grim reaper cleanup: pid=3146, wstatus=0
- Received signal child exited
WARNING: Machine creation failed, trying to provision wait: no child processes name=runner-c0956f6e-auto-scale-runners-1464791742-e8f59816
Waiting for SSH to be available...
Detecting the provisioner...
Copying certs to the local machine directory...
Copying certs to the remote machine...
- Received signal child exited
- Grim reaper cleanup: pid=3419, wstatus=0
Setting Docker configuration on the remote daemon...
- Grim reaper cleanup: pid=3391, wstatus=0
- Grim reaper cleanup: pid=3396, wstatus=0
- Received signal child exited
WARNING: Machine creation failed, trying to provision wait: no child processes name=runner-c0956f6e-auto-scale-runners-1464791742-e8f59816
Waiting for SSH to be available...
Detecting the provisioner...
- Received signal child exited
- Grim reaper cleanup: pid=3457, wstatus=0
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
- Grim reaper cleanup: pid=3480, wstatus=0
- Grim reaper cleanup: pid=3439, wstatus=0
- Received signal child exited
Machine created name=runner-c0956f6e-auto-scale-runners-1464791742-e8f59816 time=55.196392528s
- Received signal child exited
- Grim reaper cleanup: pid=3499, wstatus=0
- Grim reaper cleanup: pid=3514, wstatus=0
- Grim reaper cleanup: pid=3523, wstatus=0
- Grim reaper cleanup: pid=3528, wstatus=0
- Received signal child exited
15145 Submitting build to coordinator... ok runner=c0956f6e
WARNING: Removing machine created=57.51331712s name=runner-c0956f6e-auto-scale-runners-1464791742-e8f59816 reason=Too many idle machines used=2.31692292s
- Received signal child exited
- Grim reaper cleanup: pid=3543, wstatus=0
- Received signal child exited
- Grim reaper cleanup: pid=3558, wstatus=0
About to remove runner-c0956f6e-auto-scale-runners-1464791742-e8f59816
Successfully removed runner-c0956f6e-auto-scale-runners-1464791742-e8f59816
Configuration digitalocean
:
concurrent = 50
[[runners]]
name = "auto-scale-runner"
url = "https://gitlab.foo.bar/ci"
token = "token"
executor = "docker+machine"
limit = 20
[runners.docker]
tls_verify = false
image = "docker:latest"
privileged = true
disable_cache = false
volumes = ["/cache"]
[runners.machine]
IdleCount = 0
MachineDriver = "digitalocean"
MachineName = "auto-scale-runners-%s"
MachineOptions = [
"digitalocean-image=coreos-beta",
"digitalocean-ssh-user=core",
"digitalocean-access-token=TOKEN",
"digitalocean-region=ams3",
"digitalocean-size=4gb",
"digitalocean-private-networking"
]
Configuration google
:
concurrent = 50
[[runners]]
name = "auto-scale-runner"
url = "https://gitlab.foo.bar/ci"
token = "token"
executor = "docker+machine"
limit = 20
[runners.docker]
tls_verify = false
image = "docker:latest"
privileged = true
disable_cache = false
volumes = ["/cache"]
[runners.machine]
# IdleCount = 3 # There must be 5 machines in Idle state
# IdleTime = 600 # Each machine can be in Idle state up to 600 seconds (after this it will be removed)
# MaxBuilds = 100 # Each machine can handle up to 100 builds in a row (after this it will be removed)
MachineDriver = "google"
MachineName = "auto-scale-runners-%s"
MachineOptions = [
"google-project=igloo-project",
"google-machine-type=n1-highcpu-2",
"google-machine-image=projects/igloo-project/global/images/igloo-test-runner",
"google-zone=europe-west1-b"
]