Fix gitlab cleanup tags service default status

🌲 Context

Users can use the GitLab Container Registry to store their images and tags. With time, they can accumulate a pretty large amount of objects and those objects use space on object storage.

To help them with that challenge, we introduced cleanup policies. It's basically a set of parameters that the backend will apply to locate tags that can be removed.

Those policies are executed on a daily basis by background jobs.

Now, the other side of this problem is that data is not entirely located in the rails backend. Container repositories are but tags are on the Container Registry. As such, at the start of a policy execution for a container repository, we will ping the container registry to get the tags of that repository.

We're not going to enter all the technical details of the policy execution but at a high level, we have:

  1. The cleanup service.
  2. The cleanup tags service.
    • (B) Basically, get the list of tags (by contacting the container registry) and apply the cleanup parameters to build the "list of tags to delete".

Last piece of context: (A) can created unfinished cleanups. Those cleanups will be retried at a later time (as soon as possible) by the backend.

The problem that we've seen in #395749 (closed) is that we can have cleanups for container repositories that doesn't exist anymore on the container registry (that's a separate problem, see #395780 (closed)). On those we have this chain of events:

  1. The cleanup service will execute the cleanup tags service.
  2. The cleanup tags service will ask for tags and receive 404.
    • In this case, we will consider an empty set of tags.
  3. The cleanup tags service will thus not "walk" through the tags.
  4. It will thus return the default response structure.
  5. The problem with that structure is that it lacks the :status key.
  6. The cleanup service will mark the cleanup as unfinished.

The issue is that those cleanups will be retried over and over and over again = this is an infinite loop of cleanups that do 0 deletes.

Since it's an infinite loop, they still use background job resources, database resources and container registry resource (recall that for each cleanup, we need to get the list of tags). 😱

This MR aims to break that loop.

🚒 Solution

We're going to use a very simple solution: have a default :status in the initial response structure. This way if it is untouched during the cleanup tags service execution, we still return status: :success.

🔬 What does this MR do and why?

  • Update the default status of the response from the container registry cleanup tags service.
  • Update the related specs.

📺 Screenshots or screen recordings

Background jobs so we don't really have screen shots 😸

How to set up and validate locally

The challenge to validate this locally is that the registry shipped with GDK does not support the migration or migrated repositories. Thus, we need to pull the Container Registry project and run it out of its master branch.

Let's get started:

  1. Have a GDK ready with the Container Registry support.
  2. Follow these instructions to setup a Container Registry with the new API support.
  3. Create a new project.
  4. To push a repository with some tags, you can use https://gitlab.com/nmezzopera/container-factory or you can manually push several tags to a given repository.
  5. Enable the cleanup policy.

Now, to make the backend consider this repository as migrated, we need to update the def migrated? function to:

def migrated?
  true
end

We are not ready to check this MR. In a rails console:

repository = ContainerRepository.last

project = Project.find(<project_id>)

policy_params = project.container_expiration_policy.policy_params

::Projects::ContainerRepository::Gitlab::CleanupTagsService.new(container_repository: repository, params: policy_params.merge('container_expiration_policy' => true)).execute
=>  {:original_size=>0, :before_delete_size=>0, :deleted_size=>0, :deleted=>[], :status=>:success}

Notice the presence of status: :success

🔬 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports

Loading