Some Container Registry Repositories Created After 2022-01-23 are Served by the Old Code Path

Context

During the ongoing container registry migration repositories created after 2022-01-23 should all be served by the new code path, and the rails migration workers ignore repositories created after this date when selecting repositories to import.

Problem

Finding Affected Repositories

It appears there are some repositories that have creation dates on rails that are after 2022-01-23, but the container registry considers old. Since these repositories contain data now, they must be migrated over to the new side in the same way as historical container repositories have been migrated. To detect these repositories, we can look at this search (internal) and substitute the path from json.vars_name in the following command on a rail's console:

pp ContainerRepository.find_by_path(ContainerRegistry::Path.new('<PATH>'))

If the created_at date is after 2022-01-23 and the migration_state is "default" this means this repository is being served by the old code path on the registry, while at the same time not considered by rails or our metrics as a repository that needs to be imported.

Possible Root Causes

Parent Created After Child

It appears there is a bug in the code that determines if a repository exists or not. Found here: https://gitlab.com/gitlab-org/container-registry/-/blob/master/registry/storage/registry.go#L429

Looking at this code, the following scenario is possible:

  • Historic repository r1 exists on the old filesystem at the following path root/sub1/sub2/sub3/repository
  • New repository r2 is created somewhere on the parent path, such as root/sub1/sub2/sub3

Since the path exists on the filesystem already, the registry will serve requests to the old filesystem.

I was able to find a repository created recently enough that the first log entries from the registry side are still visible. At this time, the repository appears to be a leaf, so some other issue seems to be in effect solely or in addition to the Parent Created After Child issue.

Solution

For this, we could change the repository detecting code to look for a _manifests/ directory within the path. This would indicate the presence of manifests written to this repository and should be able to differentiate between part of a parent path and an in-use repository.

Deletion and Recreation of Repositories on Rails

It might be possible that a repository at a given path is deleted on the rails side, and then recreated later. The container registry could leave historical paths behind on the filesystem, such as the _uploads/ or _layers directory, and thus determine that these repositories are old.

Solutions
Check for _manifests/

It's possible that checking for a _manifests/ directory could resolve this issue, but it's possible this scenario might also leave behind this directory as well.

Retool Auth Eligibility

We could use the Auth Eligibility from phase one to have rails send a header that indicates that the repository is new enough that it should not be served by the old code path. This should help align what rails and the registry consider to be new repositories.

Cleanup

No mater what the solution is, we need to push the 2022-01-23 cutoff date forward so that rails will start trying to migrate repositories created after this date, since we now know that they are not guaranteed to be on the container registry database. We cannot disable the cutoff date as we would need to migrate each newly created repository, which is not likely feasible.

I think it's worthwhile to push the cutoff date forward while we investigate a solution. This issue has been present for half a year at this point, and therefore we have a backlog of repositories that we need to validate.

Edited by Hayley Swimelar