Documentation on HA and Docker is lacking
From the discussion in https://gitlab.com/gitlab-org/gitlab-ce/issues/41897.
We use Rancher+Docker, we have 4 gitlab/gitlab-ee:version containers with shared underlying filesystem.
In this case, the containers for 10.2.x were destroyed, the new ones running 10.3.x were running, migrations were ran manually in one of the containers with
gitlab-rake db:migrate.I ran a process which cloned the current containers, and then switched traffic to them. Basically, the containers got recreated again from scratch. This is when the problem was solved.
Usually though, if we ever need to restart (which doesn't happen often), we restart the Docker containers.
It sounds like what happened was that one container started up and loaded the database schema before the migrations were complete. Unicorn and Sidekiq processes should be HUP'ed or restarted, or you get the odd errors that you see. It sounds like we need to write better documentation around this in an HA environment