Skip to content

Gracefully handle non-existing repos during registry phase 2 migration

João Pereira requested to merge 356984-fix into master

Context

Related to #356984 (closed). Similar to !83446 (merged), from where some of the context and the local test instructions were copied.

We're currently implement a data migration on the Container Registry. This migration is going to be driven by the rails backend.

We're not going to detail all the details here but basically the migration goes through different states. Those states are implemented as a state machine on the ContainerRepository model.

Under some conditions, we can hit situations where the image repository considered does not exist on the Container Registry side. A row in the container_repositories table is created for a repository X on the Rails side whenever a user requests a JWT token for the repository X (if it doesn't exist already). Therefore, it's possible for user to request a token for repository foo/bar but never actually push images to it. In these cases, foo/bar won't be created on the registry side, although it "exists" on the Rails side.

With the as-is, whenever Rails attempts to start a migration for a non-existing repository on the registry side, the registry will reply with a 404 Not Found. On the Rails side this causes the migration_status of that repository record to be set to import_skipped with migration_skipped_reason set to not_found. The problem with this approach is that we should use import_skipped to mark repository for which the migration has failed due to an unexpected issue. This is not the case for non-existing repositories. We should gracefully handle these.

Note that all of this migration logic is gated behind a feature flag that is not enabled on gitlab.com.

What does this MR do and why?

  • In ContainerRepository#try_import, when we hit a non-existing repository on the registry side, we now do:
    • Update the migration status to import_done
    • Update the skip reason to not_found
  • Update the related specs

How to set up and validate locally

  1. Have a GDK ready with the registry enabled.
  2. Set up the feature flags:
    Feature.enable(:container_registry_migration_phase2_enabled)

We are ready to test things.

  1. Create a dummy image:
    repo = FactoryBot.create(:container_repository, project: Project.first)
  2. Getting a registry set up with all the parts for the migration is quite involved. Instead we're going to "stub" the registry response from the pre import call. Replace ContainerRepository#migration_pre_import with:
    def migration_pre_import
       :already_imported
    end
  3. Comment out the ContainerRepository#start_pre_import function (L262).
  4. Reload the console if necessary:
    reload!
  5. Start the pre-import:
    repo.start_pre_import
  6. Inspect the repo:
    [7] pry(main)> repo
     => #<ContainerRepository:0x00007f85aa1880d0
     id: 8,
     project_id: 125,
     name: "test_image_2",
     created_at: Mon, 28 Mar 2022 12:47:12.214586000 UTC +00:00,
     updated_at: Mon, 28 Mar 2022 12:47:16.255844000 UTC +00:00,
     status: nil,
     expiration_policy_started_at: nil,
     expiration_policy_cleanup_status: "cleanup_unscheduled",
     expiration_policy_completed_at: nil,
     migration_pre_import_started_at: Mon, 28 Mar 2022 12:47:16.243156000 UTC +00:00,
     migration_pre_import_done_at: nil,
     migration_import_started_at: nil,
     migration_import_done_at: nil,
     migration_aborted_at: nil,
     migration_skipped_at: nil,
     migration_retries_count: 0,
     migration_skipped_reason: "not_found",
     migration_state: "import_done",
     migration_aborted_in_state: nil,
     migration_plan: nil>

The migration state is import_done and the skip reason is not_found

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports