Persist and update container registry project level usage on project_statistics table
Context
We are rolling out a new version of the Container Registry in GitLab.com that includes a metadata database (&5523 (closed)). Among others, this database will allow for efficient storage usage calculations.
This issue is part of a work plan to expose the deduplicated project-level usage in GitLab Rails.
Problem
To unblock https://gitlab.com/gitlab-data/analytics/-/issues/12036, we need to persist the container registry project-level usage to the project_statistics
table in the Rails database. A change in project_statistics
will also be used later to trigger the namespace usage refresh.
Proposal
Background
At the moment, Rails queries the Container Registry API to obtain the size of a given image repository whenever a user queries its API or refreshes the UI.
This real-time/on-demand approach won't work for this particular case. Project statistics must be kept up to date. A user may push many images to the registry but never, or only infrequently, go to a UI page or make an API call that would trigger fetching the current usage from the registry. So updating the project statistics value after retrieving it from the registry (in response to a user-triggered event) is not enough. Additionally, we should avoid doing database writes during a read API request (so that reads can adhere to read-only replicas).
Alternatively, we could have a cron job that would periodically retrieve the container registry usage for each project in a predefined cadence and update the project statistics. However, this is highly inefficient. At least on GitLab.com, most image repositories change infrequently, as the Container Registry API rate is +90% reads. So there is no point in doing this.
Solution
For GitLab.com, the Container Registry is already delivering async notifications to Rails whenever something is uploaded or deleted. These notifications are currently used for Snowplow metrics and Geo-replication. They are delivered to the /api/v4/container_registry_event/events
API endpoint, and processed by API::ContainerRegistryEvent
and ContainerRegistry::Event
classes.
The notification payload includes the timestamp, the action, and the target image repository. This is documented here, including examples.
We can start processing these same notifications for storage usage purposes.
Tasks
-
Add new container_registry_size
column toproject_statistics
; -
Add new registry_size
method on theProject
model. This method should trigger aGET /gitlab/v1/repositories/my-group/my-project/?size=self_with_descendants
call to the Container Registry API, which will be possible based on the work from #347351 (closed) (which this issue is blocked by); -
Add new update_registry_size
method toProjectStatistics
. This method should callProject#registry_size
and persist the returned size inproject_statistics.container_registry_size
. -
Modify ContainerRegistry::Event
so that for each inbound notification wheretarget.repository
is e.g.my-group/my-project/my-app
it will:-
Parse the
action
within the notification payload. Onlypush
ordelete
actions can increase or decrease the registry usage. We should therefore discard notifications with other actions; -
Discard notifications that do not have a
target.tag
property. We only need to process tag creation and deletion events, as those are the only ones that can lead to an increase or decrease in usage; -
Identify the corresponding
ContainerRepository
object and the enclosingProject
object, based on thetarget.repository
property of the notification payload; -
Spawn a
UpdateProjectStatisticsWorker
background job to update the project statistics. We should pass an array of statistics that only target the registry usage. In turn, the worker should executeProjectStatistics#update_registry_size
.
-
Notes
One may ask why the size is not part of the notification sent from the registry? Calculating the deduplicated size of image repositories can only be done at runtime, and that calculation comes with a computational cost. It's not free (see container-registry#493 (closed) for an explanation).
For example, when cleanup policies run, a large number of tags can be deleted in a repository. Each of these tag deletions will trigger a notification. It would be counterproductive to calculate and include the size in all these notifications.
With this approach, we can stage (and deduplicate) bursts of notifications for the same target project on the Rails side and reduce the calls to the registry API. We intend to add caching to the container registry later on, which will allow us to simplify this process down the road. But for now, this is the most efficient option.
Open Questions
Technically, how we can stage and deduplicate bursts of notifications for the same target project is still an open question.
For example, suppose a user pushes 5 new tags to 5 different repositories that belong to the same project at about the same time. In that case, Rails will receive 25 async notifications close to each other. Triggering the refresh of project statistics for every single of these notifications is inefficient. Ideally, we'd stage notifications for the same project during, e.g., 5 minutes, and once those 5 minutes pass, we'd only trigger the related project statistics refresh once.
Caveats
As a caveat, we know that the async registry notifications are only configured in GitLab.com at the moment. It's already possible to enable them on self-managed, but they are off by default. This will have to change in the future once we're ready to release the new registry for self-managed installs.
Additionally, these notifications don't provide delivery guarantees. We don't know if this is a problem and how often it occurs (if ever). We have an issue to improve this down the road (container-registry#335 (closed)). We should accept this caveat, as this is the best we can do. We'll find a solution if/when this becomes a real problem and not just a possibility.