Progressive Delivery via Code Migration Proposal for the GitLab.com Container Registry and Self-Managed Container Registries

Context

This migration proposal is an exploration of a progressive migration strategy which keeps the original object storage bucket and uses a single registry deployment that was suggested by Sid as a counterproposal to the Gradual migration Proposal

Migration Strategy

Phase 1

Begin mirroring new incoming writes to the metadata database

Phase 2

Migrate full metadata without stopping writes

Phase 3

Once phase 2 is complete, serve reads from the database

Discussion

Open Questions

Should we roll out each phase on a repository by repository basis, or simultaneously for the entire registry?
Is there a way to completely unify the approach for all sizes of registry?
When and how should we include the metadata for dangling blobs, so that it may be deleted?
How can write mirroring be adapted such that it can compensate for heterogeneous data in the database and filesystem metadata?
How do we enforce consistency between filesystem and database metadata for potentially long periods of time?
- If these data diverge, which set of metadata should be the source of truth?
Are there additional concerns for storage drivers with poor read-after write consistency?
Since tags are mutable in that the same tag may reference different manifests over time, how should we resolve conflicts between the filesystem and database if they are encountered during the import phase?
Is it possible to import tags online without a period of read-only?
An issue that can occur when mirroring writes to both the filesystem metadata and the database when the filesystem metadata is a superset of database metadata is that when you write a tag, for example, if the manifest for that tag is only on the filesystem, you need to pull in that manifest, and likewise if the manifest blobs are not in the database, you’ll also need to pull those in as well. We called this specific problem “backfilling” when we countered it previously. It might be possible to leave dangling references in the database and pick them up during the import, but we’d have to change the schema to have weaker consistency constraints.
There are only 256 prefixes on the blob side of the registry, the GitLab.com registry has at least hundreds of millions of blobs, meaning that each prefix would contain at least one million blobs, likely more. It's unclear if the current blob enumeration techniques used in the registry are capable of handling this and how we may break this work into smaller chunks, so it's easier to retry on failed attempts and parallelize this portion of the import.

Advantages

Eliminates the need to transfer blobs
More unified approach for all registry sizes
Importing all tagged manifests and their blobs for GitLab.com will most like be quicker if we waited to import dangling blobs until after tagged manifests
Zero read-only time (pending investigation of importing tags online)

Disadvantages

The registry code will need to become more complex to handle the write mirroring and optional database reads
The online garbage collector must not be enabled until all tagged manifests are imported
Overall import time for GitLab.com will most likely be longer

Helpful links

Online garbage collection blueprint

Edited Apr 01, 2021 by Tim Rizzi