Geo: Improve performance of backfill queries
Problem to solve
Many backfill queries no longer rely on FDW due to work in &2851 (closed), and therefore some queries which were written to perform well with FDW should now be written without CTEs because it may improve performance. Also it should be more readable.
This issue is follow up of this discussion !32797 (comment 354651914).
Intended users
User experience goal
Improved performance of backfill queries.
Proposal
Remove CTEs from backfill queries where FDW is no longer used. We can start with just one case, as a small iteration.
-
Remove CTE from Packages::PackageFile.replicables_for_geo_node -
Open follow up issue to do the same for the others
Further details
From !32797 (diffs, comment 354651914):
I found a sample query looks like:
WITH "restricted_packages" AS ( SELECT "packages_packages"."id" FROM "packages_packages" WHERE "packages_packages"."project_id" IN ( SELECT "projects"."id" FROM "projects"."namespace_id" = 9970)) SELECT "packages_package_files"."id" FROM "restricted_packages" INNER JOIN "packages_package_files" ON "restricted_packages"."id" = "packages_package_files"."package_id" WHERE "packages_package_files"."id" BETWEEN 1 AND 1000with an explain plan here: https://explain.depesz.com/s/KzO
Is there a reason for using the CTE here? My concern is that if the list of
projectsis too large, the CTE will access/materialize a large data set. For example that query can be rewritten to not use a CTE:SELECT packages_package_files.id FROM packages_package_files INNER JOIN packages_packages ON packages_package_files.package_id = packages_packages.id WHERE packages_packages.project_id IN (SELECT projects.id FROM projects WHERE namespace_id = 9970) AND packages_package_files.id BETWEEN 1 AND 1000with an explain here: https://explain.depesz.com/s/HNkO