Expose deduplicated size of grouped image repositories through the API
Context
We are rolling out a new version of the Container Registry in GitLab.com that includes a metadata database (&5523 (closed)). Among others, this database will allow for efficient storage usage calculations.
This issue is part of a work plan to expose the deduplicated size of grouped image repositories in the GitLab UI and API (&7228 (closed)).
What is the deduplicated size of grouped image repositories? Say we have a my-group/my-project
container repository. Project level container repositories can have up to three sub-levels of container repositories. For example, let's consider we have my-group/my-project/foo
and my-group/my-project/bar
as well. In this case, if we want to know the deduplicated size of all repositories under a given project, we have to query the container registry for the deduplicated size of my-group/my-project
and its descendants (if any).
The Container Registry has a new API under the /gitlab/v1/
prefix, documented here. This API includes a Get repository details operation which allows retrieving details about image repositories, including their deduplicated size. To make this possible, the size
query parameter of this operation must be set to self_with_descendants
instead of self
.
Proposal
Once the Container Registry API is updated to allow querying the size of grouped repositories (container-registry#519 (closed)), we should:
-
Add support for obtaining the size of grouped image repositories to the Rails container registry client class, using the get repository details operation of the new Container Registry API. #353555 (closed) must be done first.
By the time we start working on this, support for this registry operation and the
size
parameter was already added as part of #347349 (closed). Therefore, in this issue, all we'll have to do is add support for theself_with_descendants
option of thesize
query parameter. -
Expose the size of grouped image repositories in the response of the GitLab API Get details of a single repository operation.
For the time being, this will require making an HTTP request against the new Container Registry API for each image repository. Therefore, we should start small and only add support for this to the Get details of a single repository operation, leaving the List registry repositories operation for later. Additionally, we should guard this new behavior behind an optional query parameter
descendants
(boolean), which should be paired withsize
:curl --header "PRIVATE-TOKEN: <your_access_token>" \ "https://gitlab.example.com/api/v4/registry/repositories/2?size=true&descendants=true"
We should then fill the
size
response attribute with the size returned from the registry (insize_bytes
).
Important Notes
-
Please note that for now, this operation will only be available for GitLab.com and for image repositories created on (or imported into) the new container registry platform backed by the metadata database. The Container Registry API will reply with a
404 Not Found
in case the base repository does not exist on the new platform. To avoid making unnecessary HTTP requests just to find out that the information is not available, in Rails we should restrict this operation to be used only whenGitlab.com?
istrue
and the base container repository was created afterContainerRepository::MIGRATION_PHASE_1_ENDED_AT
(which should be set to2022-01-22
). -
When querying for the aggregated size of
my-group/my-project
(base repository), it's possible thatmy-group/my-project/foo
is on the new platform butmy-group/my-project/bar
is not. In this case, the returned size will be incomplete. There are two ways we can proceed in regards to this:- Restrict this operation to base repositories whose sub repositories were all created after
ContainerRepository::MIGRATION_PHASE_1_ENDED_AT
. This guarantees that the returned size is accurate and complete, as all repositories live in the new platform; - Expand the evaluation conditions to take into account the migration status of repositories created before
ContainerRepository::MIGRATION_PHASE_1_ENDED_AT
.
- Restrict this operation to base repositories whose sub repositories were all created after