Autoscale runner mistakenly assumes machine removal succeeds, even if it didn't
I'm using the Autoscale feature using Google Compute Engine as my cloud provider. When the runner was downscaling after a recent build, I came across this error:
Mar 1 14:02:57 repo-server gitlab-runner[26527]: Removing machine
Mar 1 14:02:57 repo-server gitlab-ci-multi-runner: time="2016-03-01T14:02:57-05:00" level=warning msg="Removing machine" created=6m51.482793865s name=runner-bd061a1d-ci-docker-1456858565-949b4c25 reason="machine is unavailable" used=4m13.375792871s
Mar 1 14:02:57 repo-server gitlab-ci-multi-runner: About to remove runner-bd061a1d-ci-docker-1456858565-949b4c25
Mar 1 14:02:57 repo-server gitlab-ci-multi-runner: (runner-bd061a1d-ci-docker-1456858565-949b4c25) Deleting instance.
Mar 1 14:02:59 repo-server gitlab-ci-multi-runner: Error removing host "runner-bd061a1d-ci-docker-1456858565-949b4c25": googleapi: Error 503: Backend Error, backendError
Mar 1 14:02:59 repo-server gitlab-ci-multi-runner: Successfully removed runner-bd061a1d-ci-docker-1456858565-949b4c25
Mar 1 14:03:13 repo-server gitlab-runner[26527]: Removing machine
The removal of the machine failed, due to some intermittent error on Google's side. Note that the error was detected in the Error removing host ...
line. However, the next line is Successfully removed ...
. Hence, the runner thought that the machine had been removed, so the VM stayed online indefinitely. If the runner detects a failure when removing a host, perhaps it should retry some number of times to avoid this situation.