Scale the container tags cleanup network requests flow
Summary
Here are the all network calls made when a cleanup policy is executed:
- Get the tags list (GET
/v2/#{name}/tags/list
)- The whole list is retrieved in a single call. No pagination. The list is simply an array of names.
- For each tag, get its manifest (GET
/v2/#{name}/manifests/#{reference}"
)- The policy specifies how many tags to retain. To compute that, the backend needs to have the
created
timestamp and order the list by it. Thecreate
timestamp is not returned by (1.), so a request per tag must be done to get this field.
- The policy specifies how many tags to retain. To compute that, the backend needs to have the
- For each tag to delete, delete it (DELETE
/v2/#{name}/tags/reference/#{reference}
)
(1.) + (2.) is the preparation of the cleanup. In short, computing the list of tags to delete.
(3.) is the actual cleanup.
Currently, (3.) is limited in time (250s for gitlab.com ) to prevent resources starvation but (1.) + (2.) is not limited at all and can't be splitted because to compute the number of tags to be retained, the whole list has to be ordered.
This gives us this total number of network requests of: 2 * t + 1
where t
is the number of tags.
We have a linear relationship and this will not work for repositories with a high number of tags to delete. For example, https://gitlab.com/gitlab-org/build/CNG-mirror/container_registry/135403 has 3K tags -> we would be doing at list 3K network requests as a strict minimum.
This is a blocker for "heavy" repositories.
Improvements
Based on #288812 (comment 456138284):
- Hard limit the list of tags to process
- Do it only if #238190 (closed) is enabled
- Add proper logs
- The limit should be an application setting and have a proper default. 500?
- The trade off is:
- We could keep a tag that in the end would be deleted. As such, we're making more requests to the container registry in overall than if we could handle the whole list at once.
Risks
Regarding cleanup policies, the most important aspect is the accuracy: don't delete the tags that are not meant to be deleted.
Since this issue is about building the base list to apply the filters on and not the application of the filters itself, I would say that the risk is low.
Involved components
app/services/projects/container_repository/cleanup_tags_service.rb