Skip to content

Geo: Inconsistent database replication status when Wal-E streaming is in use

On GSTG, I think we are using Wal-E to feed the WAL segments to the secondary.

On the /admin/geo_nodes screen, I see the error, "The Geo node does not appear to be replicating the database from the primary node":

image

However, the gitlab-rake gitlab:geo:check seems okay in that respect:

stanhu@web-01-sv-gstg.c.gitlab-staging-1.internal:~$ sudo gitlab-rake gitlab:geo:check
Checking Geo ...

GitLab Geo is available ... yes
GitLab Geo is enabled ... yes
GitLab Geo secondary database is correctly configured ... yes
Using database streaming replication? ... yes
GitLab Geo tracking database is configured to use Foreign Data Wrapper? ... yes
GitLab Geo tracking database Foreign Data Wrapper schema is up-to-date? ... no
  Try fixing it:
  Follow Geo setup instructions to configure secondary nodes with FDW support
  If you upgraded recently check for any new step required to enable FDW
  If you are using Omnibus GitLab try running:
  gitlab-ctl reconfigure
  If installing from source, try running:
  bundle exec rake geo:db:refresh_foreign_tables
  For more information see:
  doc/gitlab-geo/database.md
GitLab Geo HTTP(S) connectivity ... 
* Can connect to the primary node ... yes
HTTP/HTTPS repository cloning is enabled ... yes
Machine clock is synchronized ... yes
Git user has default SSH configuration? ... yes
OpenSSH configured to use AuthorizedKeysCommand ... warning
  Reason:
  OpenSSH configuration file points to a different AuthorizedKeysCommand
  Try fixing it:
  We were expecting AuthorizedKeysCommand to be: /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell-authorized-keys-check git %u %k
  but instead it is: /opt/gitlab-shell/authorized_keys %u %k
  If you made a custom command, make sure it behaves according to GitLab's Documentation
  For more information see:
  doc/administration/operations/fast_ssh_key_lookup.md
yes
GitLab configured to disable writing to authorized_keys file ... yes

Checking Geo ... Finished

The Rails console shows:

Gitlab::Geo::HealthCheck.streaming_active?
=> false
Gitlab::Database.pg_stat_wal_receiver_supported?
=> true

However, SystemCheck::Geo::DatabaseReplicationCheck uses:

ActiveRecord::Base.connection.execute('SELECT pg_is_in_recovery()').first.fetch('pg_is_in_recovery') == 't'

We should probably be consistent here? Either we should use the pg_is_in_recovery() method or somehow we have to know that pg_stat_wal_receiver will not be active if we are using Wal-E streaming.

/cc: @vsizov

Edited by Brett Walker