Importer: Investigate the Effectiveness of a "Pre-Import" Step

Context

While a repository is being imported, it must be read-only to prevent operations which alter repository data, such as tag switches, tag deletes, blob deletes, and manifest deletes, so that the imported state will be both consistent and reflect the most recent state of the repository.

Additionally, a repository import is done entirely within a single transaction, so that the metadata database can maintain a pristine state in the case of error or interruptions that occur during this process.

Considering importing only tagged images, the time that the repository must be read-only and the length of time that the transaction must be held open increase with the following:

Number of tags
Number of layers
Whether blob transfer is enabled and if so
- The size of the blobs
- The regions of both buckets

While most repositories are small, the overall length of a repository import can rapidly become an issue: #313 (closed)

If we could determine a way to offload as much of the import work as possible before we needed to stop incoming writes and open a database transaction, we could achieve the same results with a reduced critical period.

Exploring a Repository Import

Current Operations

First, we'll begin by running through what happens, in a highly simplified example, when tagged images from a single repository are imported.

Import Flowchart

graph TD
    start{start} --> c[Find or Create repository]
    subgraph read-only and transaction
    c --> lt[list tags from filesystem]
    lt --> |for each tag| f[get the digest associated with the tag from the filesystem]
    f --> g[find a manifest by digest in the database]
    g --> h{manifest in the database?}
    h -->|yes| i[find or create tag in database]
    i --> tagQ{last tag?}
    tagQ --> |yes| finR[repository import complete]
    tagQ --> |no|f
    h ==>|likely no| k[import manifest]
    k --> getM[find manifest by digest in the filesystem]
    getM --> impB[Import blobs]
    impB --> getB[get blob from filesystem]
    getB --> makeB[create blob in the database]
    makeB --> BQ{Last blob?}
    BQ -->|yes| finM[manifest import complete]
    BQ -->|no| impB
    finM --> i
end
finR --> fin{finish}

Looking at this diagram, we can see that significantly more operations need to occur when we do not find the manifest already present in the database, which we can assume to happen most of the time: only images with multiple tags will skip this step.

Given this, I think we can add a step which will make it so that the manifest is almost always already present in the database. This pre-import step can be ran without read-only mode and with many short transactions. The import manifest step has been moved into a vignette to preserve readability:

graph TD

    subgraph preimport
    pia[begin preimport] --> piaa[create or find repository in database]
    piaa --> pib[list tags from filesystem]
    pib --> |for each tag| pic[get the digest associated with the tag from the filesystem]
    pic --> pid[find the manifest by digest in the database]
    pid --> pie{manifest in the database?}
    subgraph transaction
    pie ==> |likely no| pih[import manifest]
    end
    pih --> pitq 
    pie --> |yes| pitq{last tag?}
    pitq --> |yes| pif[preimport complete]
    pitq --> |no| pic
    end


    subgraph import
    ia[begin import] -->ib
    subgraph read-only and transaction
    ib[create or find repository in database] --> ic[list tags from filesystem]
    ic --> |for each tag| id[get the digest associated with the tag from the filesystem]
    id --> ie[find the manifest by digest in the database]
    ie --> if{manifest in the database?}
    if ==> |likely yes| ig[create tag in database]
    if --> |no| ij[import manifest]
    ij --> ig
    ig --> itq{last tag?}
    itq --> |no| id
    end
    itq --> |yes| ih[import complete]
    end



    subgraph import manifest
    mia[begin import manifest]-->mib[find manifest by digest in the filesystem]
    mib --> mic[Import blobs]
    mic --> mid[get blob from filesystem]
    mid --> mie[create blob in the database]
    mie --> mif{Last blob?}
    mif -->|yes| mig[manifest import complete]
    mif -->|no| mid
    end

    start{start} --> pia
    pif -->ia
    ih --> fin{finish}

Discussion

Running Pre-Import on an Active Repository

During the pre-import step, we do all the same work as the current import step without creating any tags in the database and associating them with their respective manifest. Since the pre-imported manifests and their blobs are untagged, they are on track to be garbage collected and therefore any manifest deletes, tag deletes, or tags switched away from an existing manifest, made to the repository while the pre-import phase is running will be honored.

In the case that new manifests were tagged while the pre-import phase was running, we will pick these up on the import pass, which will work the same way as it does now.

Outcomes of Writes During Pre-Import

The following table enumerates the outcomes of writes which can be initiated by the API during the import of a single repository, using a pre-import phase, importing only tagged manifests, and transferring blobs to a new storage bucket. The same criteria as are proposed for the GitLab.com registry migration.

Entity	Methods	Result
blob upload	`POST` or `PATCH`	In-progress blob uploads will continue as normal throughout pre-import, as they are not tracked by the metadata database at all.
blob upload	`PUT`	Blobs created by finished blob uploads may be missed during pre-import, but they will be picked up during the import phase if their related manifest was pushed and tagged successfully.
blob upload	`POST` (cross-repository blob mount)	Blobs mounted from other repositories may be missed during pre-import, but they will be picked up during the import phase if their related manifest was pushed and tagged successfully.
blob upload	`POST`, `PATCH` and `PUT`	Multipart blob uploads which begin in the pre-import phase and do not complete during it may be interrupted as their respective repository is switched to read-only mode. The client may need to retry the blob upload, or, if that fails, the entire push.
blob upload	`DELETE`	In-progress blob uploads which are cancelled during pre-import will stop as normal, as they are not tracked by the metadata database at all.
blob	`DELETE`	If a blob is deleted after one of its associated manifests was pre-imported, the import step would import the manifest with the deleted blob incorrectly still linked.
tag	`DELETE`	The manifest associated with the tag will still be pre-imported, but since the tag will not be present during the import phase, the garbage collector will delete the manifest if it is referenced by no other tag.
manifest	`PUT` (new tagged manifest)	A tagged manifest pushed up during pre-import will be imported during the import step instead.
manifest	`PUT` (tag existing manifest)	The manifest will have been imported during the pre-import step and the new tag will be associated with the manifest during the import step. The original manifest the tag was pointing to will not be tagged during the import step, and it will be garbage collected.
manifest	`DELETE`	The manifest will have been imported during the pre-import step, but since all associated tags will have been removed, the manifest will not be tagged during the import step, and it will be garbage collected.

Blob delete seems to be the trickiest part of this. I noticed that the importer does not do access checks to determine if a blob was deleted from a repository: https://gitlab.com/gitlab-org/container-registry/-/issues/331, so we don't handle this case properly outside of pre-import either. This endpoint is not used by GitLab, and I think we can justifiably assume that only admins would perform this action, so I think it should be enough to document that this action should not be performed during pre-import. This is not an operation that's part of any typical workflow, so I don't expect users to be impacted.

Advantages to this Approach

Maximizing Repository Availability

This approach primarily allows us to reduce the time we need to hold a transaction open and remain in read-only mode by front-loading as much of the manifest import logic as we can. Importing manifests, and particularly their blobs, represents a significant amount of the work of the importer.

This effectively increases the size of repository that we would be able to import without some form of manual intervention.

Import Time Estimation

The pre-import step, for lack of a better word, normalizes repository import times to the number of tags. Given two repositories which have not been pre-imported, both with 100 tags, we cannot accurately predict import time with only the number of tags: if one manifest had an average of 5 layers per manifest and the other had 10 layers per manifest, we can expect the first one to import in roughly half the time.

Our essential issue here is that the blobs which we need to import have an indirect relationship to the tags, while we know that each tag references a single manifest, each manifest references an unknown amount of blobs. We are therefore not able to make reliable predictions on the amount of time it will take to import a single tag.

If we perform a pre-import before the import of each repository, we could then expect both import phases to take roughly the same amount of time as we would only need to do operations on the tags and the manifests they reference directly. We only need to retrieve the manifest from the database and associate it with the tag.

Enabling Parallel Imports

Since we're breaking up work into more, smaller transactions, this should enable us to more easily import repositories in parallel, as transactions would be shorter lived and reduced in scope and therefore less likely to conflict with one another.

This was tested using an experimental branch and results were positive. Using the same setup as the realistic tests in #313 (closed) we were able to import matching number of tags, blobs, and manifests, within 66 minutes without blob transfer and 2.8 hours with blob transfer.

The test data that we use is heavily biased towards a handful of large repositories, in this case, the speed of the import is limited to the length of time it takes to import the largest repository. So even though we have 10TiB of data, all but five repositories are imported within 23.6 minutes with blob transfer.

These results are promising and prove the concept, but we will still need to do large scale testing to properly test the speed and reliability of this approach.

Disadvantages to this Approach

Complexity

This approach is much less straightforward than the one we're using currently. Additionally, determining what work we can and cannot do in the pre-import step requires significant familiarity with the filesystem storage, the metadata database, and the workings of the online garbage collector. This provides a significant surface area for flaws to be introduced to the import logic over time.

Reduplication of Effort

The pre-import and import steps will perform some of the same work twice: Creating or Finding the Repository in the Database, getting a list of tags from the filesystem, and getting the digest associated with each tag from the filesystem, and finding the manifest in the database again for each tag. Of those, listing tags from the filesystem is a particularly expensive operation, which can take a significant time to return results as repositories grow larger.

Ordering and Time Constraints

To achieve the results we expect, the pre-import must be ran before the import and the import should be ran before the online garbage collector has cleaned up the entities that were pre-imported. By default, the GC review delay is 24 hours, and it's possible for the pre-import times of the very largest repositories to encroach on this.

Additionally, we need to lock the repository for reads for the duration of the import phase. We'll need move from write-read mode to read-only mode in between the pre-import and import phases. If we do not wish for the importer process to coordinate this, we will need to take special care to adhere to these constraints since it's likely that each phase will be handled by separate processes.

Edited Apr 06, 2021 by Hayley Swimelar