Skip to content

Autoscaler: Healthcheck prior to using instance

Description

With the our new taskscaler/fleeting based autoscaler executors (instance, docker-autoscaler), we're seeing issues where an instance will be used despite it being removed externally (typically, spot instance termination).

Whilst we try to do our best to detect this, this is done via polling, and it can take some time to receive the update.

docker-machine didn't have this problem due to the connection with the docker daemon essentially being a continuous health-check.

Proposal

Try to replicate the behaviour of docker-machine to some extent, ensuring that we only accept a job if the connection to the remote instance hasn't been terminated.

Links to related issues and merge requests / references