Error when git data dirs has a mix of local / remote gitalies (or set gitaly_address to non-Unix socket address)
The way inter-gitaly communication works is that gitaly server metadata is passed through the RPC request context.
eg: gitaly-1 needs to talk to gitaly-2 to call FetchInternalRemote
let's say the git_data_dirs
is the following:
git_data_dirs({
"gitaly-1" => {
"gitaly_address" => "tcp://10.23.223.41:2305",
"gitaly_token" => 'xyzabc'
},
"gitaly-2" => {
"gitaly_address" => "tcp://10.23.243.98:2305",
"gitaly_token" => 'xyzabc'
}
})
In this case, gitaly-2
's information will be sent through the request context. So that once on gitaly-1
, it will find that gitaly-2
's address is tcp://10.23.243.98:2305
and dial that address. Everything is good
However, let's say we have the following instead:
git_data_dirs({
"gitaly-1" => {
"gitaly_address" => "tcp://10.23.223.41:2305",
"gitaly_token" => 'xyzabc'
},
"gitaly-2" => {
"path" => "/var/opt/gitlab/git-data"
}
})
This is a case when we are using a mix of local and remote gitalies. This will break inter-gitaly communication, because gitaly-2
's address defaults to the unix socket, /var/opt/gitlab/gitaly/gitaly.socket
.
So then /var/opt/gitlab/gitaly/gitaly.socket
gets passed to gitaly-1
, on gitaly-1
it will dial /var/opt/gitlab/gitaly/gitaly.socket
, which then happens on its remote machine. Now we have gitaly-1
calling its own socket, thinking it's gitaly-2
. not good!
Apart from a docs update, we can add validation in omnibus that throws an error on gitlab-ctl reconfigure
so we can avoid customers getting into this scenario.