Skip to content

Support container repositories deduplicated size

Context

Within the storage visibility efforts going, we want to be able to display the storage used by image repositories in a given project.

Image repository objects are data that live in a different backend: the Container Registry. So, to get such size, we have to ping the Container Registry API. Computing the size below a given path is not part of the standard Container Registry API. As such, the intended endpoint lives in the "gitlab" API (named like this because this API only exists on the GitLab fork of the Container Registry).

From the documentation, we can see that we want to use the self_with_descendants option. To use that option, we need to be cautions: a specific JWT token must be used.

So the overall plan is:

  1. Generate the right token to access that self_with_descendants option. Done in !83756 (merged)
  2. Update the existing registry client and expose a function the Project model to get that. 👈 We're here
  3. Update existing APIs (Rest and GraphQL) to expose that function.

This MR is step (2.). Simply provide a function on the Project object that will return the image repositories deduplicated size. That function will not be used here. It will be used in MR (3.).

Steps (2.) and (3.) are described in #347351 (closed)

Design choices

For starters, we're going to leave permission checks to a side. This will be the responsibility of callers the new Project function we introduce here.

The Gitlab api registry client is designed to use JWT tokens for the different endpoints. Because, we will not be at the ContainerRepository level but above (Project), we decided to simply have a class method to calling the Container Registry API and leverage the existing #repository_details function.

Now, that #repository_details function is only available on registry client instance, we thus need to build a "dummy" registry client. There is already a helper for that (with_dummy_client). We will need to update that helper so that we can instruct it which JWT token it should use.

Lastly, we will call this registry class method from the new Project function. We will apply the same conditions we have in ContainerRepository#size which is basically the exact same API call but for a single image repository only. We can see that we have some guards there:

  1. We're on gitlab.com
    • That's because the Container Registry GitLab API is currently only available on gitlab.com.
  2. We can return 0 if there are no image repositories connected to the project.
  3. Check if all image repositories are migrated.
  4. Check that we're actually connected to a Container Registry that supports the GitLab API (basically, that we're connected to the GitLab Container Registry).

One word on check (3.). There is an ongoing migration in the Container Registry. It aims to collect some image repository metadata in a database. All recent image repositories are already created there and we're currently migrating the older existing data.

To get the deduplicated size, the Container Registry will only consider the image repositories that are in the metadata database. We decided to return a size only when all image repositories are on that metadata database. If only some of them are present, we will not return a size (the actual call would return a "partial" result = not so useful).

🔬 What does this MR do and why?

  • Update ContainerRegistry::GitlabApiClient.with_dummy_client so that we can use tokens from Auth::ContainerRegistryAuthenticationService.pull_nested_repositories_access_token.
  • Add ContainerRegistry::GitlabApiClient.deduplicated_size that leverages ContainerRegistry::GitlabApiClient#repository_details.
  • Add Project#container_repositories_deduplicated_size.
  • Add ContainerRepository.all_migrated? that checks if all image repositories are migrated.
  • Update all the related specs.

🖼 Screenshots or screen recordings

n / a

How to set up and validate locally

The custom API is currently under heavy work and depends on the existence of a database for the Container Registry. These details are currently not handled by GDK, so we need to set up the Container Registry server manually.

  1. Set up the container registry in GDK.
  2. Start everything $ gdk start
  3. Stop the Container Registry from gdk: $ gdk stop
  4. Check out the Container Registry project and build the binaries with $ make
  5. Update <gdk_root>/registry/config.yml with:
    database:
      enabled: true
      host: <path to gdk root>/postgresql
      port: 5432
      user:
      dbname: registry
      password:
    gc:
      disabled: true # just to reduce the noise
  6. Connect to psql with $ gdk psql
  7. Create the registry database with CREATE DATABASE registry;
  8. Execute the registry migrations within the container registry project with : $ ./bin/registry database migrate up <path to gdk root>/registry/config.yml
  9. Start the container registry from its project with : $ ./bin/registry serve <gdk_root>/registry/config.yml
  10. Ensure that you have no errors in the logs

Setup complete.

  1. Build an image and push it to any project.
    • Push as many images/tags as you want.
    • Due to container-registry#643 (closed), please push a "root" image repository (an image repository whose path is exactly the project full path).
  2. Verify on the UI that you can see the image.
  3. In a rails console, verify that rails detected that a gitlab container repository is running:
    ApplicationSetting.first.container_registry_features
    # run these lines if the above DIDN'T return gitlab_v1_api
    UpdateContainerRegistryInfoService.new.execute
    ApplicationSetting.first.container_registry_features # -> ["tag_delete", "gitlab_v1_api"]
  4. In Project#container_repositories_size, comment out next unless Gitlab.com?
  5. Let's check the project image repositories size:
    Project.find(<project_id>).container_repositories_size # -> 2814559

📐 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

💽 Database review

Migration up

== 20220405125459 AddNonMigratedIndexToContainerRepositories: migrating =======
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:container_repositories, [:project_id, :id], {:name=>"tmp_idx_container_repos_on_non_migrated", :where=>"migration_state != 'import_done' AND created_at < '2022-01-23'", :algorithm=>:concurrently})
   -> 0.0075s
-- execute("SET statement_timeout TO 0")
   -> 0.0006s
-- add_index(:container_repositories, [:project_id, :id], {:name=>"tmp_idx_container_repos_on_non_migrated", :where=>"migration_state != 'import_done' AND created_at < '2022-01-23'", :algorithm=>:concurrently})
   -> 0.0045s
-- execute("RESET statement_timeout")
   -> 0.0006s
== 20220405125459 AddNonMigratedIndexToContainerRepositories: migrated (0.0198s)

Migration down

== 20220405125459 AddNonMigratedIndexToContainerRepositories: reverting =======
-- transaction_open?()
   -> 0.0000s
-- indexes(:container_repositories)
   -> 0.0078s
-- execute("SET statement_timeout TO 0")
   -> 0.0007s
-- remove_index(:container_repositories, {:algorithm=>:concurrently, :name=>"tmp_idx_container_repos_on_non_migrated"})
   -> 0.0033s
-- execute("RESET statement_timeout")
   -> 0.0006s
== 20220405125459 AddNonMigratedIndexToContainerRepositories: reverted (0.0204s) 

🔬 Queries

Edited by David Fernandez

Merge request reports