Gracefully fail if the manager or runner machine disappears
Summary
During recent downgrades and changes we have made to CI, builds have been stuck in a running state forever. This has occurred if the runner or docker-machine disappears while a build is running. An additional way to make this happen is if the runner hits API limits as we have seen with GCE sometimes.
Expected behavior
GitLab and the runner should attempt to either fail or complete the running task.
If the machine cannot contact the runner that spawned it, it should try to do the following:
- Finish processing the work allocated to it
- Try reporting back again at the end of it's run
- If no contact can be made, the machine should destroy itself and clean up after itself
If the runner cannot contact the machine, it should either try again with a new machine or fail the build.
If the runner hits API limits, it should slow down and wait a bit before trying to build more machines.