Geo: Fix package file backfill with sync object storage disabled

Merged Michael Kozono requested to merge mk/backfill-local-only-by-default into master

What does this MR do?

When "Sync object storage" is disabled (which is the default):

  • Preexisting package files will no longer be backfilled if they are object stored
  • Package files (and especially their registry records) will be automatically removed if they are object stored (the same worker that backfills does this: RegistryConsistencyWorker)

E.g. on staging, the Package Files failures would go away.

Also adds an index on file_store column to improve these queries when "Sync object storage" is disabled.

Part of #224634 (closed)

Database queries

with sync object storage disabled

SELECT "packages_package_files".* FROM "packages_package_files" WHERE "packages_package_files"."file_store" = 1

with selective sync by namespace and sync object storage disabled

WITH "restricted_packages" 
AS (
  SELECT "packages_packages"."id" 
  FROM "packages_packages" 
  WHERE "packages_packages"."project_id" 
  IN (
    SELECT "projects"."id" 
    FROM "projects" 
    WHERE "projects"."namespace_id" 
    IN (
      WITH RECURSIVE "base_and_descendants" 
      AS (
          SELECT "geo_node_namespace_links"."namespace_id" 
          AS id 
          FROM "geo_node_namespace_links" 
          WHERE "geo_node_namespace_links"."geo_node_id" = 100109)
          SELECT "namespaces"."id" 
          FROM "namespaces", "base_and_descendants" 
          WHERE "namespaces"."parent_id" = "base_and_descendants"."id")) 
      SELECT "id" 
      FROM "base_and_descendants" 
      AS "namespaces"))) 
SELECT "packages_package_files".* 
FROM "restricted_packages" 
INNER JOIN "packages_package_files" 
ON "restricted_packages"."id" = "packages_package_files"."package_id" 
WHERE "packages_package_files"."file_store" = 1

Plan with index added:

Plan on namespaces with a lot of packages (they both have at least one project with 15k package files or more): (index not added in this explain)



➜  gitlab git:(mk/backfill-local-only-by-default) ✗ bin/rake db:migrate 
== 20200715202659 AddIndexOnPackageFilesFileStore: migrating ==================
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:packages_package_files, :file_store, {:algorithm=>:concurrently})
   -> 0.0027s
-- add_index(:packages_package_files, :file_store, {:algorithm=>:concurrently})
   -> 0.0063s
== 20200715202659 AddIndexOnPackageFilesFileStore: migrated (0.0094s) =========


➜  gitlab git:(mk/backfill-local-only-by-default) bin/rake db:rollback  
== 20200715202659 AddIndexOnPackageFilesFileStore: reverting ==================
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:packages_package_files, :file_store, {:algorithm=>:concurrently})
   -> 0.0032s
-- remove_index(:packages_package_files, {:algorithm=>:concurrently, :column=>:file_store})
   -> 0.0030s
== 20200715202659 AddIndexOnPackageFilesFileStore: reverted (0.0065s) =========





The Package files progress bar would instead say "Nothing to sync", and the Total in the popover would be 0.

Does this MR meet the acceptance criteria?


Availability and Testing

Edited by Michael Kozono