Geo: Inconsistent database replication status when Wal-E streaming is in use
On GSTG, I think we are using Wal-E to feed the WAL segments to the secondary.
On the /admin/geo_nodes
screen, I see the error, "The Geo node does not appear to be replicating the database from the primary node":
However, the gitlab-rake gitlab:geo:check
seems okay in that respect:
stanhu@web-01-sv-gstg.c.gitlab-staging-1.internal:~$ sudo gitlab-rake gitlab:geo:check
Checking Geo ...
GitLab Geo is available ... yes
GitLab Geo is enabled ... yes
GitLab Geo secondary database is correctly configured ... yes
Using database streaming replication? ... yes
GitLab Geo tracking database is configured to use Foreign Data Wrapper? ... yes
GitLab Geo tracking database Foreign Data Wrapper schema is up-to-date? ... no
Try fixing it:
Follow Geo setup instructions to configure secondary nodes with FDW support
If you upgraded recently check for any new step required to enable FDW
If you are using Omnibus GitLab try running:
gitlab-ctl reconfigure
If installing from source, try running:
bundle exec rake geo:db:refresh_foreign_tables
For more information see:
doc/gitlab-geo/database.md
GitLab Geo HTTP(S) connectivity ...
* Can connect to the primary node ... yes
HTTP/HTTPS repository cloning is enabled ... yes
Machine clock is synchronized ... yes
Git user has default SSH configuration? ... yes
OpenSSH configured to use AuthorizedKeysCommand ... warning
Reason:
OpenSSH configuration file points to a different AuthorizedKeysCommand
Try fixing it:
We were expecting AuthorizedKeysCommand to be: /opt/gitlab/embedded/service/gitlab-shell/bin/gitlab-shell-authorized-keys-check git %u %k
but instead it is: /opt/gitlab-shell/authorized_keys %u %k
If you made a custom command, make sure it behaves according to GitLab's Documentation
For more information see:
doc/administration/operations/fast_ssh_key_lookup.md
yes
GitLab configured to disable writing to authorized_keys file ... yes
Checking Geo ... Finished
The Rails console shows:
Gitlab::Geo::HealthCheck.streaming_active?
=> false
Gitlab::Database.pg_stat_wal_receiver_supported?
=> true
However, SystemCheck::Geo::DatabaseReplicationCheck
uses:
ActiveRecord::Base.connection.execute('SELECT pg_is_in_recovery()').first.fetch('pg_is_in_recovery') == 't'
We should probably be consistent here? Either we should use the pg_is_in_recovery()
method or somehow we have to know that pg_stat_wal_receiver
will not be active if we are using Wal-E streaming.
/cc: @vsizov
Edited by Brett Walker