Skip to content

Add support for the Container Registry new tags API

David Fernandez requested to merge 367728-update-container-registry-client into master

🐗 Context

Container Registry data lives on two sides: the rails backend and the Container Registry backend. At its core, it's only two "objects":

  • Repositories. These live on the rails (under the ContainerRepository model) and the Container Registry side.
  • Tags. One repository can host many tags. These only live in the Container Registry.

This means that anything related with tags, the rails backend needs to ping the Container Registry (using the provided API). One example relevant for this MR are cleanup policies which is a way to automatically remove stale or unwanted tags (as these take space on Object Storage). Those policies are executed by background workers that currently will pretty much hammer the Container Registry API to gather information on tags. Among other things, policies need the creation timestamp of the tag to order them. The available API returns one timestamp per tag. Now, imagine what happens when the policy has to go through an repository that has 60K or 100K tags? Yeah, 💥 that many network requests to the Container Registry.

We knew that to improve the situation, we needed an API that could return a set of tags with their creation timestamps in a single call (eg. a paginated API). This evolution was gated on the Container Registry side by a data migration. That migration involves re organizing object metadata. Among other things, have this metadata stored in a database which it's easier to query.

The migration has been ongoing since January and repositories (and their tags) have been "moved" to the new code path on the Container Registry. This allowed the Container Registry to come up with a new tags API that is exactly what's needed here: it returns the list of tags of a given repository in a paginated way. Among other things, the creation timestamp is returned. See container-registry#708 (closed) and the related API documentation here: https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs-gitlab/api.md#list-repository-tags.

This unlocked the work for &8379 (closed), which is the epic to improve cleanup policies execution by using that API endpoint.

This MR is the very first one of that effort. We will focus the changes on:

  • Updating the API client.
  • Updating the ContainerRepository model to provide a function to iterate through pages automatically.

This is issue #367728 (closed) .

On follow up MRs we will:

  • Create a new service that executes the policy using this paginated API.
  • Update the background workers so that depending on the migration status of the given repository, the old or the new service is used.
    • This is necessary as the migration is still ongoing: some repositories are migrated and some aren't = those that aren't will need to use the current implementation to execution the policy.

🔍 What does this MR do and why?

  • Update lib/container_registry/gitlab_api_client.rb to support the new API endpoint.
  • Add lib/gitlab/utils/link_header_parser.rb class to parse Link response headers.
  • Update app/models/container_repository.rb with a function that iterates through pages.
  • Add/Update the related specs.

Notice that there is no changelog in this MR since those changes are not called (yet) by any part of GitLab = 0 impact on any user facing feature.

🖥 Screenshots or screen recordings

n / a

How to set up and validate locally

The challenge to validate this locally is that the registry shipped with GDK does not support the migration or migrated repositories. Thus, we need to pull the Container Registry project and run it out of its master branch.

Let's get started:

  1. Have a GDK ready with the Container Registry support.
  2. Follow these instructions to setup a Container Registry with the new API support.
  3. Create a new project.
  4. To push repositories with many tags, you can use https://gitlab.com/nmezzopera/container-factory or you can manually push several tags to a given repository.

Everything is ready at this point. Open a rails console:

  1. Get the ContainerRepository with many tags:
    repository = ContainerRepository.last
  2. Let's use the existing API to get all tags names (and verify that the setup was correct):
    repository.tags.map(&:name) # => for me: ["1", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "2", "20", "3", "4", "5", "6", "7", "8", "9"]
  3. Now, let's use the new API endpoint to do the same:
    repository.each_tags_page { |tags| pp tags.map(&:name) } # same array
  4. This is not fun as the default page size is way bigger (100) than the total number of tags (20). Let's use a smaller page size:
    repository.each_tags_page(page_size: 2) { |tags| pp tags.map(&:name) } # 10 array with 2 elements each

🚥 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports