Skip to content

Add SSH connect retry

Axel von Bertoldi requested to merge avonbertoldi/34/ssh-coonect-retry into master

A number of users have reported job failures because SSH connection attempts fail. Most of the time the cause is the SSH server not being ready to accept connections. Often in turn this is because the container's entrypoint script does some other (time consuming) stuff before starting the SSH server.

Wait until fargate task is healthy if a healthc... (!85 - merged) adds the ability to define a container health-check, and to wait for the health-check to finish/succeed. While that is a more general solution to this problem, it does require user intervention. SSH connection failure seems to be a common enough issue that it warrants its own solution that does not require user intervention.

Note that I have not added any configuration option. The retry mechanism has a baked-in 5 retries with exponential backoff starting at one second. This will give a total of 16 seconds before the final attempt. I'm amenable to making the number of attempts configurable if enough folks think it's required.

Related:

Closes Sporadic SSH connection refused (#34 - closed)

Edited by Axel von Bertoldi

Merge request reports