Geo Secondary shows unhealthy after enabling FDW in 10.6.4

Zendesk ticket: https://gitlab.zendesk.com/agent/tickets/94818

Summary

After an upgrade from 10.5.4 to 10.6.4, the secondary geo node shows as unhealthy and there is a warning that the Foreign Data Wrapper schema is out of sync in /admin/geo_nodes.

Relevant logs

gitlab-ctl reconfigure shows:

* postgresql_fdw[gitlab_secondary] action create

* postgresql_query[enable postgres_fdw extension on gitlabhq_geo_production] action run (skipped due to not_if)

* postgresql_query[create fdw gitlab_secondary on gitlabhq_geo_production] action run (skipped due to not_if)

* postgresql_query[update fdw gitlab_secondary on gitlabhq_geo_production] action run (skipped due to not_if)

gitlab-rake gitlab:geo:check shows:

Checking Geo ...

GitLab Geo is available ... yes

GitLab Geo is enabled ... yes

GitLab Geo secondary database is correctly configured ... yes

Using database streaming replication? ... yes

GitLab Geo tracking database is configured to use Foreign Data Wrapper? ... yes

GitLab Geo tracking database Foreign Data Wrapper schema is up-to-date? ... no

Try fixing it:

Follow Geo setup instructions to configure secondary nodes with FDW support

If you upgraded recently check for any new step required to enable FDW

If you are using Omnibus GitLab try running:

gitlab-ctl reconfigure

Troubleshooting steps taken

Verified contents of gitlab.rb and ran:

gitlab-ctl reconfigure
gitlab-ctl restart postgresql
gitlab-ctl reconfigure

Ran database migrations on tracking database:

gitlab-rake geo:db:migrate

Refreshed foreign tables:

gitlab-rake geo:db:refresh_foreign_tables

No foreign tables in db:

gitlabhq_geo_production=# \det
 List of foreign tables
 Schema | Table | Server 
--------+-------+--------
(0 rows)

Dropped SERVER and SCHEMA:

DROP SERVER gitlab_secondary CASCADE;
DROP SCHEMA gitlab_secondary;

We also saw the followoing:

SELECT count(1) FROM gitlab_secondary.projects; gives the error:

ERROR: No user mapping found for gitlab-psql

and gitlab complains about tables foreign tables being out of date.

The reconfigure sets a user mapping with gitlab_geo but the server appears to be using gitlab-psql.

We disabled FDW by setting geo_secondary['db_fdw'] = false and running a reconfigure. The secondary node stiil shows as unhealthy and there's now a warning that FDW is not being used.

Assignee Loading
Time tracking Loading