How do we do data transfers between Gitaly servers
Why do we need this: #477 (closed)
Why do we need to solve this now: the migration project is our top priority. We cannot wrap up the migration without supporting repository forks. So, we need to start building this.
In the future we also want to have replication. It would be nice if what we come up with here can be reused for that, but I don't think we should try to do too many things at once here.
How to transfer data from Gitaly A to Gitaly B.
Option 1: Client establishes connection to A and B and copies data back and forth between. ‘V shape’
Pro
- Makes no new assumptions about network topology
Con
- Wasteful use of client-server bandwidth
- Waste of client CPU/RAM
Option 2: Client calls A, A connects to B. (Or: client calls B, B calls A.)
Pro
- No wasted bandwidth, CPU, RAM
Con
- Assumes that servers can connect to servers
-
- Name/address resolution
-
- Firewalls must not get in the way
-
- Authentication => solvable, client can pass token for server A when making a call to server B, so B can authenthicate to A
Git specifics
We already have a way to use "git over SSH" via Gitaly, without needing an actual SSH server or client. This is how we test our Gitaly SSH RPC's. Adapting this to the server-server data transfer case should be straight-forward.
The harder question (in my view) is how we organize the requests.