Container registry cleanup policy can delete newer tags pointing to the same image

Summary

If many tags point to the same image, then the cleanup policy can delete newer ones instead of older ones. This can lead to images being used by running pipelines being deleted.

Steps to reproduce

  1. Create a pipeline that uses caching to create a new image for the pipeline tagged with the pipeline ID. This can result in multiple tags in the registry for the same image, if no changes to the image are needed.
  2. Enable the cleanup policy and configure it to keep things newer than 7 days and keep the 10 most recent tags for each image name.
  3. Given enough runs and a long enough pipeline, the cleanup policy can delete an image that is in use by a running pipeline, even if the pipeline length is always much shorter than 7 days.

What is the current bug behavior?

Newer tags can be deleted while keeping older tags for an image name.

What is the expected correct behavior?

The text of the cleanup registry configuration is followed. It says "Keep the most recent: 10 tags per image name".

Possible fixes

I suspect that the following happens, referring to https://docs.gitlab.com/ee/user/packages/container_registry/reduce_container_registry_storage.html#how-the-cleanup-policy-works

  1. The tags are ordered by created_date. The order is basically random as all the tags have the same underlying image. It could also be that the order is oldest first, if they come from the database/registry in that order.
  2. The first 10 tags are removed from the list. The selected tags depend on the sorting.
  3. All other tags are deleted, as the underlying image is now older than 7 days.

It might not be possible to fix this before the new container registry metadata system is completed. Then it should be possible to use the tag push date instead of the image creation date for the cleanup policy filtering. That would match the user expectations from the wording of the cleanup policy configuration page.

It is not really clear if there are any possible workarounds without sacrificing caching, or the unique tag for each pipeline.