Replace restart with restart_policy on compose file
This has two motivations:
-
According to the documentation, the
restart
options is ignored when deploying in swarm mode, with a version 3 compose file. See here: https://docs.docker.com/compose/compose-file/#restart -
This helps us to somewhat handle dependency between services.
On 1, the documentation already recommends to use the restart_policy instead. Reference: https://docs.docker.com/compose/compose-file/#restart_policy
Regarding 2, we were facing a problem related to dependencies between
the services we start. We were relying solely on the depends_on
option
to express a precendency relationship, but this does not work. This
option only ensures a service is running before others, not "ready".
This means, for example, that, if kong has depends on postgres, it will
not start before postgres is running. However, we need postgres fully
operational before trying to start kong. This state of full operation is
the "ready" state.
Docker documentation (https://docs.docker.com/compose/startup-order/)
says that the best solution is to attempt to re-establish the connection
to the service. But also gives the option to use wait-for-it
scripts.
At first, we tried to solve the problem with those scripts, but that
would be a huge amount of work as some services depend on more than on
service to be "ready", we don't control the Dockerfiles of all of them,
they run in different operational systems (making it difficult to
develop a single solution for all services) and those scripts seem to
tend very complicated if the command we want to run is a more complex
one (like chaining commands with the &
operator, but maybe it's only
my bash programming that is not as good as it was before).
Therefore, we went back to the first approach to attempt to re-establish
the connection. We observed that, if the manager tries to start a
service, the default behaviour in case of failure is to automatically
retry. This lead to a deadlock. The service could never start
successfully if its dependency was not ready before it. So, we
experimented with the delay
and window
options of the
restart_policy
, aiming to make the manager try to start other services
while the one that failed is on a waiting period. We chose the waiting
values with the logic that the "base" (postgres, mongodb, redis) and
"independant" (kong-api-gateway, helloworld) services do not wait long
before retrying: they need to be started as soon as possible. The others
need to wait longer because, if they are scheduled to be started before
the "base" ones, they need to wait a long enough period so the manager
can schedule the "base" ones to be started.
Turns out this approach has worked as expected with much less work.
Needs to be merged after !42 (merged)
Part of #31 (closed)