GitLab Runner should gracefully handle spot price changes with AWS Autoscale runner
Note: Solution in ~upstream https://github.com/docker/machine/pull/4312 has been merged, but not released yet.
A customer (https://gitlab.my.salesforce.com/0016100000SEjM1) saw an issue where if the spot price raised, their auto-scale runner would fail to create them:
/var/log/syslog.4.gz:Sep 14 08:42:07 ip-172-16-3-69 gitlab-runner[23169]: time="2017-09-14T08:42:07Z" level=error msg="Error creating machine: Error in driver during machine creation: Error fulfilling spot request: InvalidSpotInstanceRequestID.NotFound: The spot instance request ID 'sir-y4k842sp' does not exist" driver=amazonec2 name=runner-13aa6052-as-spot-eu-west-1c-1505378524-994cd39c operation=create #012<nil>
This eventually eats 60 requests and then AWS won't accept any more. Then once the spot price is acceptable, we are locked out for a bit because we exceeded the call amount.
They use docker-machine ls -q --filter state=Error --format "{{.NAME}}"
to see the machine state
If left alone will self heal eventually.
GitLab runner should either backoff exponentially or some such way and or have a timeout to stop trying here.
Edited by Fabio Busatto