Validate that we should support FDW for Geo selective sync
Description
Geo needs to perform cross-database queries for counts and repository / file backfill. We compare the project_registry
table in the Geo tracking database to the projects
table in the main DB, and the file_registry
table to the uploads
, ci_artifacts
, and lfs_objects
tables. This can be using using FDW or non-FDW ("legacy") queries, with the former being preferred in most cases.
When ~Geo selective sync is enabled, we unconditionally use the legacy queries, as we haven't implemented the necessary conditionals for selective sync in the FDW case yet. These queries do not scale to large numbers of projects, possibly making selective sync unsuitable for large secondary instances, or a staged rollout of Geo for a large instance, as desired in https://gitlab.com/gitlab-org/gitlab-ee/issues/4625
Proposal
Verify whether the intuition about selective sync with large sets of projects is correct. If it is, implement selective sync support for the various FDW queries in https://gitlab.com/gitlab-org/gitlab-ee/tree/master/ee/app/finders/geo