Currently, there is no way to clean up old revisions of a container image in a registry. As a result, the registry can grow very large, especially when
building containers as part of CI. In many cases users only care about the latest revision. There is an API endpoint in the registry to clean up older revisions but there are a few problems.
Proposal
To solve this problem for users we will introduce a separate tool that can be run administratively to remove unused images, or obsolete images based on administrator preference.
Currently, we provide a Docker Distribution GC (https://docs.gitlab.com/omnibus/maintenance/#container-registry-garbage-collection), but it can only remove layers that are no longer referenced. The use case of today's registry is that people push a lot of revisions to the same tag. Due to an architecture of Docker Distribution, it preserves all revisions in the system, thus consumes a lot of space. This tool should allows users to wipe-out these historical revisions from the GitLab Container Registry.
I was searching for something to clean my registry.
I've found some script interacting with api and one that delete files from the registry folder.
Will something like this work? https://github.com/burnettk/delete-docker-registry-image
Is there any news on this yet?
We are holding off using our whole Gitlab CI chain and the Githost registry for production until there is solved as otherwise the amount of extra block storage we would need to add would be extreme.
If there isn't a solution soon then we will have to look elsewhere.
I see that there is a button to delete an image tag from in the gitlab registry UI.
There's also a gitlab-ctl registry-garbage-collect command.
I assume that the first simply removes image tags, while the second will removed unreferenced layers/blobs from the filesystem. Is that correct?
Also, there's something wrong with the Remove tag button: When deleting a certain tag, all tags pointing to the same sha (and therefore have the same tag id in the list) are also deleted. I expect this isn't actually intended behavior and is therefore a bug.
btw... I consider very important that kind of feature for gitlab registry... because keeping forever old revisions of dockers that are "updated" by pushing them again makes no sense to me.
@ayufan thank you for your tool, it has been working wonders, much better than registry garbage-collect which over time left 74 GB of stale data:
First non-dry run with docker-distribution-pruner -config=/var/opt/gitlab/registry/config.yml -delete -soft-delete=false. Ideal real life test case since our images are just for CI so no real impact if we lose anything.
more feedback from a customer here: "The deletion by image tag would be great I had a way to programmatically do it, which would mean at a bare minimum being able to programmatically enumerate tags as well as to programmatically delete them. Deleting tags by clicking through the UI is completely unacceptable -- way too error prone and tedious, given the ten thousands of images we have accumulated over the last year."
@lbot It is possible to do this today, using Docker Registry API. The same API is being used by GitLab. User with master account does have permission to delete tags.
@lbot You can not delete by tags. Only by digest which starts with sha256:. For myself I did find the way to get the digest from docker images --gidests true.
@dblessing I think the point is not the deleting of the diskpace (yet) but a way to mass delete 'old' images. We also have tons of projects creating masses of new test containers for feature branches etc. I am looking for a way to like program the trash button in the Registry section of the project.
We are affected by this too - accumulated needless 200Gb over 3 months and hit 100% of fs causing further failures to push. Glad we had it on a separate partition.
Is there an easy way to let's say: remove the tag of all images older than 1 months, but keep at least the last 3 ones? Then the docker-distribution-pruner would be really useful.
Just realized that CI jobs were crashing because there were only 300 MB disk space left on the runner. Luckily found this page and removed the outdated docker images manually. I am surprised that this hasn't yet been automated.
@ayufan - How do I connect to the gitlab container registry, via the Docker Registry API, if it is secured behind https?
@ayufan - It is possible to do this today, using Docker Registry API. The same API is being used by GitLab. User with master account does have permission to delete tags.
@gdmello I agree with @tholu, HTTPS doesn't change anything; the issue is more related to #40096 (closed), which basically means that from inside the CI it's impossible to obtain enough privileges with the registry to delete a tag.
Hence, @tholu, the pruner is not useful here, since it deletes only dangling references, so it's not automatic, and it's impossible to automate it right now.
@tholu - I am a bit confused! :) The pruner is not helpful, as it doesn't allow me to specify which tags to be removed. It is also considered experimental, not for production usage. I only have a single gitlab server for dev and prod projects, so need the ability to delete images/tags from the dev projects but retain the images in production.
@gdmello You are definitely right, the pruner is not the final solution for this problem. A better solution e.g. with the ability to specify automatic deletion of images/tags from certain projects while keeping the last x versions would be very much appreciated from me as well.
Perhaps you can show your ideas and plans for the solution? Do you have some code already?
I'm no expert in this topic though, just an interested user looking for a solution as well.
@tholu@gdmello we have the same problem here, we can't use gitlab registry in production without a possibility to automaticaly prune old snapshot image of developper identified by a patern.
We are using Gitlab EE
@tholu & @Alessandro.Lai - So I ran docker-distribution-pruner and it revealed that 325GB on my registry was locked in unreferenced version of tags, unreferenced manifests, unreferenced layers and thus blobs.
Since I was able to delete most of the images in across all my projects with a custom home grown script, which a colleague wrote, followed gitlab-ctl registry-garbage-collect, the registry should have been purged. However, it looks like either a bug in the way the registry is storing layers, or the way we possibly untag images in our pipelines.
@yohann.fabri1 - The script my colleague wrote up is a bit complex, and uses the backend postgres db to get project/repository ids and then hit the gitlab container registry urls to retrieve a list of images. I wouldn't recommend this approach, but in the absence of a gitlab internal solution, this is the quickest we could come up with.
This is my last (failed) attempt: I tried to build an image during the CI, which I would like to delete at the end of each pipeline.
This would be a (redacted) piece of my .gitlab-cy.yml, the last job that should take care of cleaning the registry after each run.
delete-commit-image:stage:Cleanupbefore_script:### GET JWT TOKEN FOR REGISTRY using $GITLAB_ACCESS_TOKEN as login-'curlhttps://registry.my-gitlab.com/jwt/auth--get--v--include-dclient_id=docker-doffline_token=true-dservice=container_registry-d"scope=repository:$CI_IMAGE_NAME:pull,*"--fail--usermy-user:$GITLAB_ACCESS_TOKEN'### EXTRACT JWT TOKEN FROM JSON ANSWER -'exportCI_REGISTRY_TOKEN=$(curlhttps://registry.my-gitlab.com/jwt/auth--get-dclient_id=docker-doffline_token=true-dservice=container_registry-d"scope=repository:$CI_IMAGE_NAME:pull,*"--fail--usermy-user:$GITLAB_ACCESS_TOKEN|sed-r"s/(\{\"token\":\"|\"\})//g")'### EXTRACT SHA OF TAG TO BE DELETED USING DOCKER REGISTRY API-'exportMANIFEST_SHA_TO_BE_DELETED=$(curlhttps://registry.my-gitlab.com:4567/v2/group/project/image-name/manifests/$TAG_TO_BE_DELETED--head--fail-H"accept:application/vnd.docker.distribution.manifest.v2+json"-H"authorization:Bearer$CI_REGISTRY_TOKEN"|grep-i"Docker-Content-Digest"|grep-o-i"sha256:\w\+")'script:### TRY TO DELETE TAG --- this fails for insufficient scope of token-'curl"https://registry.my-gitlab.com:4567/v2/group/project/image-name/manifests/$MANIFEST_SHA_TO_BE_DELETED"-XDELETE--include-v-H"accept:application/vnd.docker.distribution.manifest.v2+json"-H"authorization:Bearer$CI_REGISTRY_TOKEN"'
This approach, as already I've said, fails due to #40096 (closed) that prevents privilege escalation of the registry from inside a pipeline (I know that because the same approach done manually from my console works).
@Alessandro.Lai please file a complete example (CI_IMAGE_NAME?), obfuscated responses of curl -v and a decrypted bearer token (you could use https://jwt.io/). When I created #40096 (closed) deleting images was working for me as long as I'm using personal access tokens.
Maybe I am off here, but could we simply make it possible to remove tags on the images. And then run docker system prune on the server to remove unused and untagged items.
Then we could have an optional feature that allowed you to untag builds that was older than a certan amount of time (say 2 weeks). With option to whitelist some tags to never be untagged.
@pck the issue here is that we do not have any programmatic way to do it; the only way (for now) to delete a tag is to go in the registry web interface and press the delete button; and that will, in reality, delete the image and all related tags, not simply the tag.
Also, as pointed out in #40096 (closed), doing this from the CI doesn't work well due to restricted permission assigned to the CI registry token that you get inside a CI job.
@ayufan this is quite requested by users, and maybe we can consider your work for the distribution pruner and improve it to support a production environment. What do you think? Having a documented way to clean up images can be a first step.
The workflow that I seem to use the most often is to:
Push a new tag consisting of CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
Update the tags for: $CI_COMMIT_REF_NAME (usually the branch) and latest.
So if only the branch master exists and has been pushed to at least once, I'm going to have registry tags master, latest, and a hash that's tied to the commit SHA.
Over time the number of containers in the registry ballons because it is the same as <number_of_commits> + <number_of_branches>, and I think in order to deal with this situation, it would be really nice to be able to prune registry tags that don't look like a repository branch name or tag name, over some time interval. (e.g. Delete all container tsgs that are older than 3 months AND (are not a current branch name OR current tag name))
Since tags are just links to images (and not a resource in the Docker Registry API, unfortunately), an easier workflow IMHO could be:
allow to specify a TTL against a tag in some way
if a tag expires, it should be deleted (but current Docker API don't allow deleting a tag, just the image, which means cascade-deleting all the tags pointing to that same image)
if all the tags pointing to that image are expired, the image will be deleted
With this process, not adding a TTL to a tag like "latest" or "master" will make just one image non-deletable by the automatic mechanism. All the other ones, especially commit-tags, could have a very short TTL (1 day?) and make all the older images auto-delete when the latest/master tag is pointed elsewhere (i.e. a new commit-tag).
Sometimes we still reference an old image via a specific tag (well all images are tagged by build number). So a script that deleted by last created date will delete images still in use. Is there not a last accessed date for images in the registry via api? If we could order by last accessed and keep images accessed in the last N days that would be better. Old branches that are no longer used would be deleted.
Or also could the registry be tied to branches? Already can delete merged branches, and if we could delete the images that were created from that branch that would tie up nicely. When a merge request is approved and it deletes the source branch, any images created from that branch should be cleaned up also. Registry labels are often $CI_REGISTRY_IMAGE/$CI_COMMIT_REF_SLUG which is pretty much the branch anyway.
@CoskunSunali I hope so.
My gitlab server requires me to clean up all the images weekly right now. We are creating hundreds of gigs of unused containers. And it is a pain to clean it up by hand. I have two scripts that do it, but it still should not require us to login and clean up its mess.
Our CI fails periodically when the old images take up old the space. Have to get in and clean them manually right now which is a pain and very inefficient.
I too, want to be able to easily clean old images. But I also recognize that this isn't something that GitLab can easily "automate". They can provide a better way to clean them up but that's about it. Unless they make a whole thing that has "rule based clean up with webhooks and remote calls" it's still going to require people write their own scripts.
Let's face it, everyone will want the clean up to do something else. One will want the clean up to "remove all images that aren't the last build". Another will want "all images older than N days". And others will want to "remove images which I don't use anywhere in production"...
For example, neither "remove all except most recent" and "remove all older than N days" work for me. That would leave me with production servers not able to pull a container that is still running in production. I would love that my production containers always operate off the latest images but that's not always the case and may never be possible.
So what we need is a simple, usable, API to clean up container imagess and something that will automatically reclaim the space after the images have been removed.
It will need to have various selectors such as "N days". But will also need to be where you can first get a list of images and tags, then YOUR script checks if they are in your through your Orchestration platform, then you can call the GitLab API again to delete those images you don't use.
@etlweather It would be great if we had access to clean via tags, N days, by branch, and a variety of filters. All of our production images are created from branches starting with 'production-*' If we could remove anything else after 2 days, that would be great.
@etlweather As I understand this issue, it is about removing images from the registry that are not visible in the GitLab UI anymore (usually because the tag has been superseded by a new image or because it has been deleted manually). As long as it is there, it should not be deleted.
That wouldn't work for you? Are you using images by their hash and not through a tag? In other words, are you using images that you can't see in the GitLab UI?
I respect what everyone else says about deletion rules, etc.
My request is simpler. If I delete a tag/image via the Registry page within a project, I would like that tag/image to be deleted physically from the registry. Nothing more, nothing less.
I am not a docker expert so I would respect if this is not something doable/feasible. E.g. higher version of tags actually use volumes from earlier versions of tags.
However, at minimum, if an image is completely deleted, I guess it could be deleted physically too.
My use case is, I have a project which automatically builds/pushes new docker images to the registry and tags them. The automation supports deletion of images completely, including the tags of course.
The physical files remaining on the server becomes a problem when we want to backup the GitLab server.
Our backup strategy backs up the things every hour, 24 times a day. To an external storage of course (multiple actually)
So in the end of the day, if a 1 GB image has been deleted (aspnetcore-build for instance), it still costs 24 GB of unnecessary space on our backup storages, forever, unless someone finds a way to delete those images completely. That is just for one day. We keep backups for the last 30 days. Multiple 24 by 30 and that is 720 GB of unnecessary space consumption. That is just for one image.
Deleting those files from the runners is easy, just use docker CLI. But I realize GitLab stores physical files related to those images under /var/opt/gitlab/gitlab-rails/shared/registry/docker/registry/v2/blobs/sha256 and that directory sucks all the available space by the time.
@CoskunSunali Indeed. Down the page there is an even scarier warning:
Warranty
Application was manually tested, also was run in dry run mode against large repositories to verify consistency.
As of today we are still afraid of executing that on production data. There's no warranty that it will not break repository.
Still, we had to find a solution. So we ran it in dry-run mode first for a while, then added -delete, and finally we now use -delete -soft-delete=false and rely on backups if need be (which are made daily, with 6 months retention). So far (8 months in) it stands the test of time (not a warranty of any form though).
It's ridiculous that this is still an issue after a year, and all the while Gitlab is touting the Container Registry feature. Not only is there not an API to delete images (#21608 (closed)), but even if you do so manually through the UI, it doesn't actually reclaim any disk space. Why is this so hard? No one would adopt this feature knowing this upfront.
To clean up old images/unwanted images is a basic feature that should be available, Again why is it so hard? Is it a big ask? Any update from our friendly gitlab on this?
I might be wrong but I have a feeling that GitLab development is driven too much by sales guys (too many links to those mysterious Zendesk customers) rather than tech guys.
There are several Stories like this one that [bright] community needs, but they stay ignored.
It's not how GitLab started and not what insures long-term greatness of the product.
Yes, GitLab will get influx of former GitHub customers, due to its sale to we all know who, but in a year or two momentum might be lost.
Don't be surprised if in 5-10 years GitLab itself will be bought by MS.
On a bright side consultants like myself will always have bread on the table improving GitLab... something that happened to TFS and other proprietary tools earlier.
I really hope they'll listen and get power back to those who understand what tech is needed long term rather than continue low hanging fruits gathering.
Just to respond real quick to the last comment. From the support side, we use Zendesk for our ticketing system. So the ZD/Zendesk links are to help us, the support team, close the loop and track issues. Not the sales team (we still love you sales guys :) )