Skip to content

Replace restart with restart_policy on compose file

This has two motivations:

  1. According to the documentation, the restart options is ignored when deploying in swarm mode, with a version 3 compose file. See here: https://docs.docker.com/compose/compose-file/#restart

  2. This helps us to somewhat handle dependency between services.

On 1, the documentation already recommends to use the restart_policy instead. Reference: https://docs.docker.com/compose/compose-file/#restart_policy

Regarding 2, we were facing a problem related to dependencies between the services we start. We were relying solely on the depends_on option to express a precendency relationship, but this does not work. This option only ensures a service is running before others, not "ready". This means, for example, that, if kong has depends on postgres, it will not start before postgres is running. However, we need postgres fully operational before trying to start kong. This state of full operation is the "ready" state.

Docker documentation (https://docs.docker.com/compose/startup-order/) says that the best solution is to attempt to re-establish the connection to the service. But also gives the option to use wait-for-it scripts. At first, we tried to solve the problem with those scripts, but that would be a huge amount of work as some services depend on more than on service to be "ready", we don't control the Dockerfiles of all of them, they run in different operational systems (making it difficult to develop a single solution for all services) and those scripts seem to tend very complicated if the command we want to run is a more complex one (like chaining commands with the & operator, but maybe it's only my bash programming that is not as good as it was before).

Therefore, we went back to the first approach to attempt to re-establish the connection. We observed that, if the manager tries to start a service, the default behaviour in case of failure is to automatically retry. This lead to a deadlock. The service could never start successfully if its dependency was not ready before it. So, we experimented with the delay and window options of the restart_policy, aiming to make the manager try to start other services while the one that failed is on a waiting period. We chose the waiting values with the logic that the "base" (postgres, mongodb, redis) and "independant" (kong-api-gateway, helloworld) services do not wait long before retrying: they need to be started as soon as possible. The others need to wait longer because, if they are scheduled to be started before the "base" ones, they need to wait a long enough period so the manager can schedule the "base" ones to be started.

Turns out this approach has worked as expected with much less work.

Needs to be merged after !42 (merged)

Part of #31 (closed)

Merge request reports