Skip to content

Container Registry: Perform Inventory of Repositories

Context

We are Deploying and migrating a new container registry for GitLab.com as part of this effort, we need to better understand the data which is present on object storage on the current GitLab.com registry. These data will allow us to produce more accurate estimations on migration time, as well as identity large repositories (in terms of tags) which is a critical factor in the success for the migration.

What data will be gathered

We'll need a complete list of repositories and a total of their tags. It's important to note that these data may include customer names, so the details must not be publicly accessible.

How much data will be collected

We expect upwards 500,000 repositories, and for each of those we'll store a path such as registry.gitlab.com/gitlab-org/build/cng/gitlab-container-registry paired with an integer representing the tag count.

How will this data be collected and stored

We are in the process of developing a tool to perform this import here: gitlab-org/container-registry#337 (closed). One of the unresolved questions is how and where to store the data generated. Ideally, this tool will also be used to populate a list of repositories for the Migration Coordination Service, so this question also has implications for that effort in addition to the immediate data we need to gather.

Cross-reference with GitLab Rails to obtain a namespace's tier

We'll need to identify the tier of each namespace in the registry. For example, for a repository my-group/my-project, we need to be able to identify the tier of my-group on the Rails side.

Edited by João Pereira