Perform Geo replication through an automatically-configured VPN

Description

Currently, Geo state transfer from primary to secondary can happen over the public Internet, so it must be encrypted. We use a number of mechanisms, including:

  • PostgreSQL replication
  • Repository synchronization over HTTPS or SSH (deprecated, to be removed in %10.3)
  • File synchronization over HTTPS
  • Manual file rsyncs (deprecated)

Each of these mechanisms needs encryption and trust configuring separately, which is painful. PostgreSQL, in particular, was left unencrypted for an extended time. Additionally, having these geo-required services available to the world (TCP port 5432 for postgres, the Geo sync api, etc) is needlessly risky.

There are also some services that don't support encryption natively, such as redis. This means that even if we managed to work out how to replicate it sensibly (see https://gitlab.com/gitlab-org/gitlab-ee/issues/4070)

Finally, we have difficulty with RFC1918 vs. public IP addresses in the documentation. It's not clear what people should be using and why

Proposal

Introduce an VPN setup on the primary. This would need to take account of HA to avoid being a bottleneck, so openvpn may not be suitable. Alternatives

The secondaries would establish a tunnel connection to the primary and all Geo traffic would happen through this tunnel. Attempts to sync anything via the public IP of the primary would fail.

This is a big change and difficult to get right. It would require significant gitlab-omnibus work (/cc @marin), but may be eventually necessary in any case, and it would allow us to rationalise the setup and simplify documentation while also improving the security characteristics of Geo (/cc @briann).

Links / references

/cc @jramsay

Assignee Loading
Time tracking Loading