Break Container Registry API / Delete tags in bulk work into smaller units of work
Summary
Currently, the API delete repository tags in bulk at the project level enqueues a single job that will read the associated container registries, read/filter the tags and loop on them to erase them.
See #31097 (comment 232368406)
Improvements
Let's say that the job has 1000 tags to delete. Right now, it will loop on those 1000 elements and erase them one by one.
We could break this loop into smaller jobs. For example, we could loop and enqueue jobs for deleting one or n tags. This way, the work is decomposed in smaller unit of tasks and depending on the amount of workers attached to the queue, they could be executed in parallel and thus the processing time of all these tags (at the cost of increased queue's load).
Risks
-
We can spike the number jobs to an unreasonable amount. Let's say we have 100'000 tags and we enqueue 1 job per tag which will give 100'000 jobs at once.
- We should adapt the way the tags are chunked. For example, instead of splitting 1 job per tag we could split the work in a known number of chunks. For example, we could implement that tags deletion will always be handled with 10 jobs max. From our previous example, we would have one job per 10'000 tags.
-
Currently, tags deletion is mainly slowed by the calls to the Docker API (each API call has to get all the tags from Docker API which is expensive). It's not clear how much an execution in parallel will impact the overall execution time.
Involved components
-
Projects::ContainerRepository::CleanupTagsServiceis the job that currently loops on tags. It could enqueue jobs for tags deletion.
## Other considerations
- Note that the API call is already queuing
Projects::ContainerRepository::CleanupTagsService. This refactoring will improve the background processing execution time but not the API call execution (it will stay the same).