GitLab Geo database replication
Note: This is the documentation for installations from source. For installations using the Omnibus GitLab packages, follow the database replication for Omnibus GitLab guide.
Note: Stages of the setup process must be completed in the documented order. Before attempting the steps in this stage, complete all prior stages.
- Install GitLab Enterprise Edition on the server that will serve as the secondary Geo node. Do not login or set up anything else in the secondary node for the moment.
- Upload the GitLab License you purchased for GitLab Enterprise Edition to unlock GitLab Geo.
Setup the database replication topology (
primary (read-write) <-> secondary (read-only))
- Configure SSH authorizations to use the database
- Configure GitLab to set the primary and secondary nodes.
- Follow the after setup steps.
This document describes the minimal steps you have to take in order to replicate your GitLab database into another server. You may have to change some values according to your database setup, how big it is, etc.
You are encouraged to first read through all the steps before executing them in your testing/production environment.
The GitLab primary node where the write operations happen will connect to primary database server, and the secondary ones which are read-only will connect to secondary database servers (which are read-only too).
Note: In many databases documentation you will see "primary" being referenced as "master" and "secondary" as either "slave" or "standby" server (read-only).
Since GitLab 9.4: We recommend using PostgreSQL replication slots to ensure the primary retains all the data necessary for the secondaries to recover. See below for more details.
The following guide assumes that:
- You are using PostgreSQL 9.6 or later which includes the
- You have a primary server already set up (the GitLab server you are replicating from), and you have a new secondary server set up on the same OS and PostgreSQL version. Also make sure the GitLab version is the same on all nodes.
- The IP of the primary server for our examples will be
126.96.36.199, whereas the secondary's IP will be
188.8.131.52. Note that the primary and secondary servers must be able to communicate over these addresses. These IP addresses can either be public or private.
Step 1. Configure the primary server
SSH into your GitLab primary server and login as root:
Add this node as the Geo primary by running:
bundle exec rake geo:set_primary_node
Create a replication user named
sudo -u postgres psql -c "CREATE USER gitlab_replicator REPLICATION ENCRYPTED PASSWORD 'thepassword';"
Set up TLS support for the PostgreSQL primary server
Warning: Only skip this step if you know that PostgreSQL traffic between the primary and secondary will be secured through some other means, e.g., a known-safe physical network path or a site-to-site VPN that you have configured.
If you are replicating your database across the open Internet, it is essential that the connection is TLS-secured. Correctly configured, this provides protection against both passive eavesdroppers and active "man-in-the-middle" attackers.
To do this, PostgreSQL needs to be provided with a key and certificate to use. You can re-use the same files you're using for your main GitLab instance, or generate a self-signed certificate just for PostgreSQL's use.
Prefer the first option if you already have a long-lived certificate. Prefer the second if your certificates expire regularly (e.g. LetsEncrypt), or if PostgreSQL is running on a different server to the main GitLab services (this may be the case in a HA configuration, for instance).
To generate a self-signed certificate and key, run this command:
openssl req -nodes -batch -x509 -newkey rsa:4096 -keyout server.key -out server.crt -days 3650
This will create two files -
server.crt- that you can use for authentication.
PostgreSQL's permission requirements are very strict, so whether you're re-using your certificates or just generated new ones, copy the files to the correct location. Do check that the destination path below is correct!
If you're re-using certificates already in GitLab, they are likely to be in the
/etc/ssldirectory. If your domain is
primary.geo.example.com, the commands would be:
# Copying a certificate and key currently used by GitLab install -o postgres -g postgres -m 0400 -T /etc/ssl/certs/primary.geo.example.com.crt ~postgres/9.x/main/data/server.crt install -o postgres -g postgres -m 0400 -T /etc/ssl/private/primary.geo.example.com.key ~postgres/9.x/main/data/server.key
If you just generated a self-signed certificate and key, the files will be in your current working directory, so run:
# Copying a self-signed certificate and key install -o postgres -g postgres -m 0400 -T server.crt ~postgres/9.x/main/data/server.crt install -o postgres -g postgres -m 0400 -T server.key ~postgres/9.x/main/data/server.key
Add this configuration to
postgresql.conf, removing any existing configuration for
ssl = on ssl_cert_file='server.crt' ssl_key_file='server.key'
postgresql.confto configure the primary server for streaming replication (for Debian/Ubuntu that would be
listen_address = '184.108.40.206' wal_level = hot_standby max_wal_senders = 5 min_wal_size = 80MB max_wal_size = 1GB max_replicaton_slots = 1 # Number of Geo secondary nodes wal_keep_segments = 10 hot_standby = on
Be sure to set
max_replication_slotsto the number of Geo secondary nodes that you may potentially have (at least 1).
For security reasons, PostgreSQL by default only listens on the local interface (e.g. 127.0.0.1). However, GitLab Geo needs to communicate between the primary and secondary nodes over a common network, such as a corporate LAN or the public Internet. For this reason, we need to configure PostgreSQL to listen on more interfaces.
listen_addressoption opens PostgreSQL up to external connections with the interface corresponding to the given IP. See the PostgreSQL documentation for more details.
You may also want to edit the
max_wal_sendersto match your database replication requirements. Consult the PostgreSQL - Replication documentation for more information.
Set the access control on the primary to allow TCP connections using the server's public IP and set the connection from the secondary to require a password. Edit
pg_hba.conf(for Debian/Ubuntu that would be
host all all 127.0.0.1/32 trust host all all 220.127.116.11/32 trust host replication gitlab_replicator 18.104.22.168/32 md5
22.214.171.124is the public IP address of the primary server, and
126.96.36.199the public IP address of the secondary one. If you want to add another secondary, add one more row like the replication one and change the IP address:
host all all 127.0.0.1/32 trust host all all 188.8.131.52/32 trust host replication gitlab_replicator 184.108.40.206/32 md5 host replication gitlab_replicator 220.127.116.11/32 md5
Restart PostgreSQL for the changes to take effect.
Choose a database-friendly name to use for your secondary to use as the replication slot name. For example, if your domain is
geo-secondary.mydomain.com, you may use
geo_secondary_my_domain_comas the slot name.
Create the replication slot on the primary:
$ sudo -u postgres psql -c "SELECT * FROM pg_create_physical_replication_slot('geo_secondary_my_domain');" slot_name | xlog_position -------------------------+--------------- geo_secondary_my_domain | (1 row)
Now that the PostgreSQL server is set up to accept remote connections, run
netstat -plntto make sure that PostgreSQL is listening to the server's public IP.
Step 2. Add the secondary GitLab node
To prevent the secondary geo node trying to act as the primary once the database is replicated, the secondary geo node must be configured on the primary before the database is replicated.
- Visit the primary node's Admin Area ➔ Geo Nodes
/admin/geo_nodes) in your browser.
- Add the secondary node by providing its full URL. Do NOT check the box 'This is a primary node'.
- Added in GitLab 9.5: Choose which namespaces should be replicated by the secondary node. Leave blank to replicate all. Read more in selective replication.
- Click the Add node button.
Step 3. Configure the secondary server
SSH into your GitLab secondary server and login as root:
Set up PostgreSQL TLS verification on the secondary If you configured PostgreSQL to accept TLS connections in Step 1, then you need to provide a list of "known-good" certificates to the secondary. It uses this list to keep the connection secure against an active "man-in-the-middle" attack.
If you reused your existing certificates on the primary, you can use the list of valid root certificates provided with your distribution. For Debian/Ubuntu, they can be found in
mkdir -p ~postgres/.postgresql ln -s /etc/ssl/certs/ca-certificates.crt ~postgres/.postgresql/root.crt
If you generated a self-signed certificate, that won't work. Copy the generated
server.crtfile onto the secondary server from the primary, then install it in the right place:
install -o postgres -g postgres -m 0400 -T server.crt ~postgres/.postgresql/root.crt
PostgreSQL will now only recognize that exact certificate when verifying TLS connections.
Test that the remote connection to the primary server works:
If you're using a CA-issued certificate and connecting by FQDN:
sudo -u postgres psql -h primary.geo.example.com -U gitlab_replicator -d "dbname=gitlabhq_production sslmode=verify-ca" -W
If you're using a self-signed certificate or connecting by IP address:
sudo -u postgres psql -h 18.104.22.168 -U gitlab_replicator -d "dbname=gitlabhq_production sslmode=verify-full" -W
When prompted enter the password you set in the first step for the
gitlab_replicatoruser. If all worked correctly, you should see the database prompt.
Exit the PostgreSQL console:
postgresql.confto configure the secondary for streaming replication (for Debian/Ubuntu that would be
wal_level = hot_standby max_wal_senders = 5 checkpoint_segments = 10 wal_keep_segments = 10 hot_standby = on
Restart PostgreSQL for the changes to take effect.
Optional since GitLab 9.1, and required for GitLab 10.0 or higher: Enable tracking database on the secondary server
Otherwise, continue to initiate the replication process.
Enable tracking database on the secondary server
Geo secondary nodes use a tracking database to keep track of replication status and recover automatically from some replication issues.
It is added in GitLab 9.1, and since GitLab 10.0 it is required.
IMPORTANT: For this feature to work correctly, all nodes must be with their clocks synchronized. It is not required for all nodes to be set to the same time zone, but when the respective times are converted to UTC time, the clocks must be synchronized to within 60 seconds of each other.
Setup clock synchronization service in your Linux distro. This can easily be done via any NTP-compatible daemon. For example, here are instructions for setting up NTP with Ubuntu.
database_geo.ymlwith the information of your secondary PostgreSQL database. Note that GitLab will set up another database instance separate from the primary, since this is where the secondary will track its internal state:
sudo cp /home/git/gitlab/config/database_geo.yml.postgresql /home/git/gitlab/config/database_geo.yml
Edit the content of
production:like the example below:
# # PRODUCTION # production: adapter: postgresql encoding: unicode database: gitlabhq_geo_production pool: 10 username: gitlab_geo # password: host: /var/opt/gitlab/geo-postgresql
Create the database
gitlabhq_geo_productionin that PostgreSQL instance.
Set up the Geo tracking database:
bundle exec rake geo:db:migrate
Step 4. Initiate the replication process
Below we provide a script that connects to the primary server, replicates the database and creates the needed files for replication.
The directories used are the defaults for Debian/Ubuntu. If you have changed any defaults, configure it as you see fit replacing the directories and paths.
Warning: Make sure to run this on the secondary server as it removes all PostgreSQL's data before running
SSH into your GitLab secondary server and login as root:
Save the snippet below in a file, let's say
/tmp/replica.sh. Modify the embedded paths if necessary:
#!/bin/bash PORT="5432" USER="gitlab_replicator" echo --------------------------------------------------------------- echo WARNING: Make sure this script is run from the secondary server echo --------------------------------------------------------------- echo echo Enter the IP or FQDN of the primary PostgreSQL server read HOST echo Enter the password for $USER@$HOST read -s PASSWORD echo Enter the required sslmode read SSLMODE echo Stopping PostgreSQL and all GitLab services gitlab-ctl stop echo Backing up postgresql.conf sudo -u postgres mv /var/opt/gitlab/postgresql/data/postgresql.conf /var/opt/gitlab/postgresql/ echo Cleaning up old cluster directory sudo -u postgres rm -rf /var/opt/gitlab/postgresql/data rm -f /tmp/postgresql.trigger echo Starting base backup as the replicator user echo Enter the password for $USER@$HOST sudo -u postgres /opt/gitlab/embedded/bin/pg_basebackup -h $HOST -D /var/opt/gitlab/postgresql/data -U gitlab_replicator -v -x -P echo Writing recovery.conf file sudo -u postgres bash -c "cat > /var/opt/gitlab/postgresql/data/recovery.conf <<- _EOF1_ standby_mode = 'on' primary_conninfo = 'host=$HOST port=$PORT user=$USER password=$PASSWORD sslmode=$SSLMODE' trigger_file = '/tmp/postgresql.trigger' _EOF1_ " echo Restoring postgresql.conf sudo -u postgres mv /var/opt/gitlab/postgresql/postgresql.conf /var/opt/gitlab/postgresql/data/ echo Starting PostgreSQL and all GitLab services gitlab-ctl start
Run it with:
When prompted, enter the IP/FQDN of the primary, and the password you set up for the
gitlab_replicatoruser in the first step. If you are re-using existing certificates and connecting to an FQDN, use
If you have to connect to a specific IP address, rather than the FQDN of the primary, to reach your PostgreSQL server, then you should use
sslmodeinstead. This should only be the case if you have also used a self-signed certificate.
verify-cais not safe if you are connecting to an IP address and re-using an existing TLS certificate!
preferif you are happy to skip PostgreSQL TLS authentication altogether (e.g., you know the network path is secure, or you are using a site-to-site VPN).
You can read more details about each
sslmodein the PostgreSQL documentation; the instructions above are carefully written to ensure protection against both passive eavesdroppers and active "man-in-the-middle" attackers.
The replication process is now over.
Now that the database replication is done, the next step is to configure GitLab.
We don't support MySQL replication for GitLab Geo.
Read the troubleshooting document.