Container Registry: Perform Inventory of Repositories
Context
We are Deploying and migrating a new container registry for GitLab.com as part of this effort, we need to better understand the data which is present on object storage on the current GitLab.com registry. These data will allow us to produce more accurate estimations on migration time, as well as identity large repositories (in terms of tags) which is a critical factor in the success for the migration.
What data will be gathered
We'll need a complete list of repositories and a total of their tags. It's important to note that these data may include customer names, so the details must not be publicly accessible.
How much data will be collected
We expect upwards 500,000 repositories, and for each of those we'll store a path such as registry.gitlab.com/gitlab-org/build/cng/gitlab-container-registry paired with an integer representing the tag count.
How will this data be collected and stored
We are in the process of developing a tool to perform this import here: gitlab-org/container-registry#337 (closed). One of the unresolved questions is how and where to store the data generated. Ideally, this tool will also be used to populate a list of repositories for the Migration Coordination Service, so this question also has implications for that effort in addition to the immediate data we need to gather.
Cross-reference with GitLab Rails to obtain a namespace's tier
We'll need to identify the tier of each namespace in the registry. For example, for a repository my-group/my-project, we need to be able to identify the tier of my-group on the Rails side.