You need to sign in or sign up before continuing.
Autoscaler: Healthcheck prior to using instance
Description
With the our new taskscaler/fleeting based autoscaler executors (instance
, docker-autoscaler
), we're seeing issues where an instance will be used despite it being removed externally
(typically, spot instance termination).
Whilst we try to do our best to detect this, this is done via polling, and it can take some time to receive the update.
docker-machine
didn't have this problem due to the connection with the docker daemon essentially being a continuous health-check.
Proposal
Try to replicate the behaviour of docker-machine
to some extent, ensuring that we only accept a job if the connection to the remote instance hasn't been terminated.