Geo: Upgrade race condition can leave foreign tables out-of-date
As discovered in omnibus-gitlab#3474 (comment 212340753):
@mkozono @fzimmer Writing down https://gitlab.com/gitlab-org/gitlab-ee/issues/14000#note_212310220 gave me an idea on this...
The foreign schema is recreated with:
DROP SCHEMA IF EXISTS gitlab_secondary CASCADE; CREATE SCHEMA gitlab_secondary; IMPORT FOREIGN SCHEMA public FROM SERVER gitlab_secondary INTO gitlab_secondary;
It runs on the tracking db, against the main database. But... The Geo secondary node not update the schema of the main db, that only happens when the Geo primary node performs the migrations and when those schema changes are replicated to the Geo secondary main db.
Therefore, my theory:
- The update on the Geo secondary happens
- During that update, FDW is refreshed, but against the old main db schema
- Later... the schema changes are replicated from the Geo primary to the Geo secondary node (because the Geo primary was updated later, or there is a large replication lag)
- Geo secondary detects FDW mismatch
So although there might be a problem with conditions if the FDW refresh can run (omnibus-gitlab#3474 (comment 198478540)), but there also might be an order-dependent problem.
And you should keep in mind this thing called post-deployment migrations, so they run while the new version is already running (i.e. upgrade complete).
Steps to reproduce
(How one can reproduce the issue - this is very important)
What is the current bug behavior?
FDW mismatch after completing an upgrade following all steps.
What is the expected correct behavior?
No FDW mismatch after completing an upgrade following all steps.
Relevant logs and/or screenshots
(Paste any relevant logs - please use code blocks (```) to format console output, logs, and code as it's tough to read otherwise.)
Possible fix: Modify upgrade instructions
Instruct the sysadmin how to ensure post-migrations are finished and replicated, and only then run
Follow up: Add a rake task or something to make it easier.