Skip to content

Creating New Nested Repositories During Import Creates Parent Repositories Marked as "native"

Context

During import, we will create new repositories on the database into which the filesystem metadata can be imported. If the repository is nested, as most are, we will also create each parent repository before the creation of the new repository.

Problem

If a child repository is migrated before a parent repository which also contains images, the parent repository will be created in the database and marked native. The parent repository would later appear as already migrated to the import handler logic since it will be marked native. Additionally, the phase 2 routing logic would begin routing requests to the parent repository through the database, although it's still only present on the filesystem, causing a split brain.

Solutions

Manual Parent Repository Creation

Within the import handler logic, we could avoid using CreateOrFindByPath, and use CreateOrFind to create each repository in order, setting the status appropriately if that parent repository does not exist.

Pros

  • should prevent race conditions with the phase 2 routing logic
  • This change would be accomplished by simply composing existing methods within the import handler logic

Cons

  • the handlers rely completely on integration tests, which are less well suited to validating more subtle changes in behavior than unit tests
  • we are creating logic already implemented in the RepositoryStore to find and create the parent repositories
  • if these parent repositories are not already present on the old storage prefix, we must always update their status to native once they receive a write request
  • further complicates phase 2 routing logic

Top Down Import

The import workers should only queue repositories whose parents have been imported.

Pros

  • the current default logic for top level repositories works appropriately for this scenario.

Cons

  • requires an external entity to correctly manage internal registry states

Distinct Migration Status Value for Automatically Created Parent Repositories

Instead of defaulting to native, repositories which are created when incidentally creating a child repository will get a distinct migration status, indicating that they have not been explicitly created.

Pros

  • small change
  • should prevent race conditions with the phase 2 routing logic
  • easier to achieve a high degree of confidence with tests
  • handler logic needs to know far less about repository creation

Cons

  • further complicates phase 2 routing logic
  • if these parent repositories are not already present on the old storage prefix, we must always update their status to native once they receive a write request, so that the phase 2 routing logic and import handle can handle these repositories efficiently and correctly

Stop recording intermediate repositories and delete existing empty ones

See #570 (comment 818150257).

Status

We went with the Stop recording intermediate repositories and delete existing empty ones approach. All steps were completed except the cleanup. This will be done in #625 (closed).

Edited by João Pereira