Importer: Enable Migration of Blobs to Separate Storage Backend
Problem to Solve
Currently, the import tool only records metadata about images and blobs within the storage backend in the configuration passed in the command line arguments.
To facilitate gradual migrations of large registry deployments, in which a separate, new instance of the registry backed by a metadata database operates alongside the original filesystem metadata backed registry, the import tool should be able to transfer blob content from the storage backend from which metadata is being imported to the storage backend configured in the new registry.
Additionally, when disablemirrorfs: false
is set under the migration
section, we'll need to link blob and manifest metadata within the target registry's filesystem repository.
Challenges and Possible Solutions
Configuration and Credentials
The import tool will either need to have access to two independently configured storage backends. This is an extremely high level of access which could have security implications.
Large Blobs
Blob sizes can be quite large, so we need to a sophisticated mechanism to transfer this data to the new storage backend. Ideally, we should try to avoid pulling down the blob locally and then pushing, but it's unlikely we will be able to avoid this in a way that is agnostic to the type of storage backend.
GCS
If both buckets are GCS, then it's possible we could use the go library to transfer these objects: https://godoc.org/cloud.google.com/go/storage#Copier this is potentially the more performant than a general solution.
General Blob Transfer
In general, we should be able to use the Reader
method of a storage driver instantiated from the original bucket and a BlobWriter
instantiated from a storage.Registry
configured from the new bucket to perform a multi-part transfer. This technique should be able to handle configuration options on the new registry like filesystem metadata mirroring, as well as dealing gracefully with partial failed uploads.
One drawback to this technique is the triangular data transfer from the old bucket, to the importer's memory, then finally to the new bucket.