Skip to content

GitLab Runner should gracefully handle spot price changes with AWS Autoscale runner

Note: Solution in ~upstream https://github.com/docker/machine/pull/4312 has been merged, but not released yet.

A customer (https://gitlab.my.salesforce.com/0016100000SEjM1) saw an issue where if the spot price raised, their auto-scale runner would fail to create them:

/var/log/syslog.4.gz:Sep 14 08:42:07 ip-172-16-3-69 gitlab-runner[23169]: time="2017-09-14T08:42:07Z" level=error msg="Error creating machine: Error in driver during machine creation: Error fulfilling spot request: InvalidSpotInstanceRequestID.NotFound: The spot instance request ID 'sir-y4k842sp' does not exist" driver=amazonec2 name=runner-13aa6052-as-spot-eu-west-1c-1505378524-994cd39c operation=create #012<nil>

This eventually eats 60 requests and then AWS won't accept any more. Then once the spot price is acceptable, we are locked out for a bit because we exceeded the call amount.

They use docker-machine ls -q --filter state=Error --format "{{.NAME}}" to see the machine state

If left alone will self heal eventually.

GitLab runner should either backoff exponentially or some such way and or have a timeout to stop trying here.

Edited by Fabio Busatto