Skip to content

Improve performance of the container repository cleanup tags service

What does this MR do?

This MR resolves #208220 (closed) (and #32228 (closed) as a side-effect). It's a follow-up from !23325 (merged).

Context

In !23325 (merged) we have implemented a new way to delete tags from the container registry, which represented a ~80% performance improvement (#31832 (comment 289633770)).

During development, we have identified that the async tag cleanup service (Projects::ContainerRepository::CleanupTagsService) could be improved in a similar way. Now that we're rolling out the Docker Tag Retention & Expiration Policies feature, the performance of this service must be improved.

Most of the improvements achieved in this MR are based on the changes implemented in !23325 (merged). Please refer to that MR for additional context.

Rationale

Current Implementation

To delete a tag, the CleanupTagsService currently requires the following network requests against the Container Registry:

  1. Get the tag manifest digest (HEAD /v2/<repository>/manifests/<tag>);
  2. Get the tag manifest (GET /v2/<repository>/manifests/<tag>);
  3. Get the tag manifest configuration (GET /v2/<repository>/blobs/<digest>);
  4. Delete the tag manifest (DELETE /v2/<repository>/manifests/<digest>).

Proposal

Only one network request is needed to delete a tag with the change proposed on this MR: DELETE /v2/<repository>/tags/reference/<tag>.

The responsibility of ensuring that the referenced manifest and other tags are not soft-deleted is offloaded to the Container Registry. This is done by removing all the tag deletion logic from Projects::ContainerRepository::CleanupTagsService and delegating it to Projects::ContainerRepository::DeleteTagsService (please see !23325 (merged) for more details on how it works).

Additionally, in this MR we no longer sort tags by creation date by default. Doing so requires obtaining the tag manifest configuration (3), which is a costly network operation. Unless we want to keep only the last N tags (i.e. unless the keep_n parameter is used) there is no need to sort tags by creation date.

Improvements

Number of Requests

In the worst-case scenario, the current implementation requires 4N network requests to delete N tags while the proposed change requires only N. The number of requests is especially important when using a cloud storage backend (like S3 or GCS) as they incur in network latency and additional charges.

It's also important to notice that each one of the required network requests can translate to N other. This is true because some of these requests, like getting the tag manifest and configuration (used in the current implementation), require the registry to traverse the filesystem (using multiple requests) looking for the corresponding files.

The througput results will show how big the impact is when doing a path traversal using a remote cloud storage backend.

Throughput

Setup

  1. Setup a GDK instance (instructions), including a container registry (instructions).

  2. We'll use the gitlab-org/gitlab-test sample project for this demonstration. Make sure there are no repositories for this project.

  3. Upload 100 random unique tags to the gitlab-org/gitlab-test repository. To do this we used the following script (assuming that the container registry is listening at $REGISTRY_ADDR):

    echo "FROM busybox:latest
    RUN head -5 /dev/urandom > data" > Dockerfile
    
    for i in {1..100}
    do
        docker build --no-cache -t $REGISTRY_ADDR/gitlab-org/gitlab-test:$i .
        docker push $REGISTRY_ADDR/gitlab-org/gitlab-test:$i
    done
  4. Open a Rails console and make sure the repository has been populated with 100 tags as expected:

    bundle exec rails c
    project = Project.find_by_full_path 'gitlab-org/gitlab-test'
    repository = project.container_repositories.first
    repository.tags.count
    => 100

Methodology

To test throughput, we have created a sample repository with 100 unique tags as described above, and then invoked the Projects::ContainerRepository::CleanupTagsService service from the Rails console, measuring the elasped time. No filters were used, so all 100 tags are removed:

bundle exec rails c
project = Project.find_by_full_path 'gitlab-org/gitlab-test'
repository = project.container_repositories.first
params = { "name_regex" => ".*" }
service = Projects::ContainerRepository::CleanupTagsService.new(project, User.first, params)
Benchmark.measure { puts service.execute(repository) }.real

Each test was run two times (reloading the Rails console session and repopulating the registry between runs) and the average elapsed time (in seconds) was calculated.

We have performed these tests using the filesystem and a GCS bucket as the registry storage backend. This allows us to see the impact of reducing the number of network requests required to delete the tags.

Results

Storage Backend Master branch This MR Improvement
Filesystem 1.3s 0.5s 62%
GCS 428.4s 24.1s 94%

Considering the results above, the proposed approach in this MR is up to 94% faster than the current implementation.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • ~~Label as security and @ mention @gitlab-com/gl-security/appsec~~
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by João Pereira

Merge request reports