Skip to content

Add a facade for all cleanup tags services

David Fernandez requested to merge 367731-switch-for-cleanup-tags-services into master

🗞 Context

Container Registry data lives on two sides: the rails backend and the Container Registry backend. At its core, it's only two "objects":

  • Repositories. These live on the rails (under the ContainerRepository model) and the Container Registry side.
  • Tags. One repository can host many tags. These only live in the Container Registry.

This means that anything related with tags, the rails backend needs to ping the Container Registry (using the provided API). One example relevant for this MR are cleanup policies which is a way to automatically remove stale or unwanted tags (as these take space on Object Storage). Those policies are executed by background workers that currently will pretty much hammer the Container Registry API to gather information on tags. Among other things, policies need the creation timestamp of the tag to order them. The available API returns one timestamp per tag. Now, imagine what happens when the policy has to go through an repository that has 60K or 100K tags? Yeah, 💥 that many network requests to the Container Registry.

We knew that to improve the situation, we needed an API that could return a set of tags with their creation timestamps in a single call (eg. a paginated API). This evolution was gated on the Container Registry side by a data migration. That migration involves re organizing object metadata. Among other things, have this metadata stored in a database which it's easier to query.

The migration has been ongoing since January and repositories (and their tags) have been "moved" to the new code path on the Container Registry. This allowed the Container Registry to come up with a new tags API that is exactly what's needed here: it returns the list of tags of a given repository in a paginated way. Among other things, the creation timestamp is returned. See container-registry#708 (closed) and the related API documentation here: https://gitlab.com/gitlab-org/container-registry/-/blob/master/docs-gitlab/api.md#list-repository-tags.

This unlocked the work for &8379 (closed), which is the epic to improve cleanup policies execution by using that new API endpoint.

That epic has the following implementation iterative plan:

  1. Support the new tags API endpoint in the existing container registry client. That's !94065 (merged).
  2. Add the new cleanup service that will that new tags API endpoint. That's issue #367729 (closed).
  3. Create a switch with a feature flag so that the proper service is selected according to the repository migration status and the feature flag. 👈 You are here. That's issue #367731 (closed).
  4. Incrementally deploy the change and monitor the impact on cleanup policies.

In this MR, we will basically replace the currently used service by a facade. That facade will behave exactly in the same way as the old service (currently in use) and the new one. This facade will basically delegate the #execute call to the right service.

We will go from:

stateDiagram-v2
    state "Projects::ContainerRepository::CleanupTagsService" as cts
    state "Projects::ContainerRepository::DeleteTagsService" as dts
    [*] --> cts
    cts --> dts

to

stateDiagram-v2
    state "Projects::ContainerRepository::CleanupTagsService" as cts
    state "Projects::ContainerRepository::DeleteTagsService" as dts
    state "Projects::ContainerRepository::Gitlab::CleanupTagsService" as gcts
    state "Projects::ContainerRepository::ThirdParty::CleanupTagsService" as tcts
    [*] --> cts
    cts --> gcts
    cts --> tcts
    gcts --> dts
    tcts --> dts

So, how will we choose to use the new service vs the old one? Well, there are a few requirements:

  • The container repository needs to be migrated. This means that it exists in the shiny new metadata database of the Container Registry.
  • Given that the new service uses a custom API endpoint, we will check the capabilities of the Container Registry and to make sure that we're connected with a GitLab Container Registry.
  • As a safety net, we will throw in a feature flag so that we can force all calls to use the old service (existing one).

🔬 What does this MR do and why?

  • Move body of Projects::ContainerRepository::CleanupTagsService to Projects::ContainerRepository::ThirdParty::CleanupTagsService.
  • Introduce a new Projects::ContainerRepository::CleanupTagsService that is a switch that will choose between Projects::ContainerRepository::ThirdParty::CleanupTagsService and Projects::ContainerRepository::Gitlab::CleanupTagsService.
    • The switch will log which service is used for monitoring purposes.
  • Update the related specs.
  • The switch is based on repository migration status, container registry capabilities and a feature flag container_registry_new_cleanup_service. Rollout issue: #375037 (closed).

📺 Screenshots or screen recordings

n / a

How to set up and validate locally

Testing this MR is a bit challenging as we need a Container Registry running with the metadata database support. Still, it's not impossible, so let's get started.

  1. Have a GDK ready with the Container Registry support.
  2. Follow these instructions to setup a Container Registry with the new API support.
  3. Create a new project.
  4. Let's push 20 tags to a given image:
    $ for i in {1..20}
    do
    docker build -t gdk.test:5000/<project_path>/<image name>:$i .
    docker push gdk.test:5000/<project_path>/<image name>:$i
    done
  5. Check the UI: http://gdk.test:8000/<project_path>/container_registry. You should see your image and the 20 tags.

Everything is ready. Now, let's cleanup those tags.

  1. In a rails console:
    service = ::Projects::ContainerRepository::CleanupTagsService.new(container_repository: ContainerRepository.last, current_user: User.first, params: { 'name_regex' => '.*', 'keep_n' => 1 })
    service.execute
    # => {:deleted=>["8", "7", "6", "5", "4", "3", "20", "2", "19", "18", "17", "16", "15", "14", "13", "12", "11", "10", "9"], :status=>:success, :cached_tags_count=>0, :original_size=>20, :before_truncate_size=>20, :after_truncate_size=>20, :before_delete_size=>19, :deleted_size=>19}
  2. Check the UI, only 1 tag remains.
  3. Check log/application_json.log, we have this line:
    {"severity":"INFO","time":"2022-09-23T07:54:07.897Z","correlation_id":null,"container_repository_id":131,"container_repository_path":"root/many-tags/image2","project_id":146,"third_party_cleanup_tags_service":true}
    • The old service has been used.

Now, let's enable the feature flag:

  1. Feature.enable(:container_registry_new_cleanup_service)
  2. Redo step (4.) from the setup to push a new container repository with 20 tags.
  3. Same execution here, let's build the "facade" service and call it:
    service = ::Projects::ContainerRepository::CleanupTagsService.new(container_repository: ContainerRepository.last, current_user: User.first, params: { 'name_regex' => '.*', 'keep_n' => 1 })
    service.execute
    # => {:original_size=>20, :before_delete_size=>19, :deleted_size=>19, :deleted=>["19", "18", "17", "16", "15", "14", "13", "12", "11", "10", "9", "8", "7", "6", "5", "4", "3", "2", "1"], :status=>:success}
    • The return structure is slightly different because the new service doesn't use truncation nor caching.
  4. Let's check log/application_json.log:
    {"severity":"INFO","time":"2022-09-23T08:01:42.195Z","correlation_id":null,"container_repository_id":132,"container_repository_path":"root/many-tags/image3","project_id":146,"gitlab_cleanup_tags_service":true}
    • This time around the new service has been used! 🎉

The facade service is behaving as expected! 🎉

🛃 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports