Support container repositories deduplicated size
⛳ Context
Within the storage visibility efforts going, we want to be able to display the storage used by image repositories in a given project.
Image repository objects are data that live in a different backend: the Container Registry. So, to get such size, we have to ping the Container Registry API. Computing the size below a given path is not part of the standard Container Registry API. As such, the intended endpoint lives in the "gitlab" API (named like this because this API only exists on the GitLab fork of the Container Registry).
From the documentation, we can see that we want to use the self_with_descendants
option. To use that option, we need to be cautions: a specific JWT token must be used.
So the overall plan is:
- Generate the right token to access that
self_with_descendants
option.✅ Done in !83756 (merged) - Update the existing registry client and expose a function the
Project
model to get that.👈 We're here - Update existing APIs (Rest and GraphQL) to expose that function.
⌛
This MR is step (2.). Simply provide a function on the Project
object that will return the image repositories deduplicated size. That function will not be used here. It will be used in MR (3.).
Steps (2.) and (3.) are described in #347351 (closed)
⚔ Design choices
For starters, we're going to leave permission checks to a side. This will be the responsibility of callers the new Project
function we introduce here.
The Gitlab api registry client is designed to use JWT tokens for the different endpoints. Because, we will not be at the ContainerRepository
level but above (Project), we decided to simply have a class method to calling the Container Registry API and leverage the existing #repository_details
function.
Now, that #repository_details
function is only available on registry client instance, we thus need to build a "dummy" registry client. There is already a helper for that (with_dummy_client
). We will need to update that helper so that we can instruct it which JWT token it should use.
Lastly, we will call this registry class method from the new Project
function. We will apply the same conditions we have in ContainerRepository#size
which is basically the exact same API call but for a single image repository only. We can see that we have some guards there:
- We're on gitlab.com
- That's because the Container Registry GitLab API is currently only available on gitlab.com.
- We can return
0
if there are no image repositories connected to the project. - Check if all image repositories are migrated.
- Check that we're actually connected to a Container Registry that supports the GitLab API (basically, that we're connected to the GitLab Container Registry).
One word on check (3.). There is an ongoing migration in the Container Registry. It aims to collect some image repository metadata in a database. All recent image repositories are already created there and we're currently migrating the older existing data.
To get the deduplicated size, the Container Registry will only consider the image repositories that are in the metadata database. We decided to return a size only when all image repositories are on that metadata database. If only some of them are present, we will not return a size (the actual call would return a "partial" result = not so useful).
🔬 What does this MR do and why?
- Update
ContainerRegistry::GitlabApiClient.with_dummy_client
so that we can use tokens fromAuth::ContainerRegistryAuthenticationService.pull_nested_repositories_access_token
. - Add
ContainerRegistry::GitlabApiClient.deduplicated_size
that leveragesContainerRegistry::GitlabApiClient#repository_details
. - Add
Project#container_repositories_deduplicated_size
. - Add
ContainerRepository.all_migrated?
that checks if all image repositories are migrated. - Update all the related specs.
🖼 Screenshots or screen recordings
n / a
⚙ How to set up and validate locally
The custom API is currently under heavy work and depends on the existence of a database for the Container Registry. These details are currently not handled by GDK, so we need to set up the Container Registry server manually.
- Set up the container registry in GDK.
- Start everything
$ gdk start
- Stop the Container Registry from gdk:
$ gdk stop
- Check out the Container Registry project and build the binaries with
$ make
- Update
<gdk_root>/registry/config.yml
with:database: enabled: true host: <path to gdk root>/postgresql port: 5432 user: dbname: registry password: gc: disabled: true # just to reduce the noise
- Connect to psql with
$ gdk psql
- Create the
registry
database withCREATE DATABASE registry;
- Execute the registry migrations within the container registry project with :
$ ./bin/registry database migrate up <path to gdk root>/registry/config.yml
- Start the container registry from its project with :
$ ./bin/registry serve <gdk_root>/registry/config.yml
- Ensure that you have no errors in the logs
Setup complete.
-
Build an image and push it to any project.
- Push as many images/tags as you want.
- Due to container-registry#643 (closed), please push a "root" image repository (an image repository whose path is exactly the project full path).
- Verify on the UI that you can see the image.
- In a rails console, verify that rails detected that a gitlab container repository is running:
ApplicationSetting.first.container_registry_features # run these lines if the above DIDN'T return gitlab_v1_api UpdateContainerRegistryInfoService.new.execute ApplicationSetting.first.container_registry_features # -> ["tag_delete", "gitlab_v1_api"]
- In
Project#container_repositories_size
, comment outnext unless Gitlab.com?
- Let's check the project image repositories size:
Project.find(<project_id>).container_repositories_size # -> 2814559
📐 MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
💽 Database review
⤴ Migration up
== 20220405125459 AddNonMigratedIndexToContainerRepositories: migrating =======
-- transaction_open?()
-> 0.0000s
-- index_exists?(:container_repositories, [:project_id, :id], {:name=>"tmp_idx_container_repos_on_non_migrated", :where=>"migration_state != 'import_done' AND created_at < '2022-01-23'", :algorithm=>:concurrently})
-> 0.0075s
-- execute("SET statement_timeout TO 0")
-> 0.0006s
-- add_index(:container_repositories, [:project_id, :id], {:name=>"tmp_idx_container_repos_on_non_migrated", :where=>"migration_state != 'import_done' AND created_at < '2022-01-23'", :algorithm=>:concurrently})
-> 0.0045s
-- execute("RESET statement_timeout")
-> 0.0006s
== 20220405125459 AddNonMigratedIndexToContainerRepositories: migrated (0.0198s)
⤵ Migration down
== 20220405125459 AddNonMigratedIndexToContainerRepositories: reverting =======
-- transaction_open?()
-> 0.0000s
-- indexes(:container_repositories)
-> 0.0078s
-- execute("SET statement_timeout TO 0")
-> 0.0007s
-- remove_index(:container_repositories, {:algorithm=>:concurrently, :name=>"tmp_idx_container_repos_on_non_migrated"})
-> 0.0033s
-- execute("RESET statement_timeout")
-> 0.0006s
== 20220405125459 AddNonMigratedIndexToContainerRepositories: reverted (0.0204s)