The GitLab Cleanup policy for tags have been rolled out to all new projects, however, it does not cover existing projects. This prevents admin from doing automated clean-up on existing projects and from truly lowering the cost of their container registry.
The policy utilizes the Container Registry API to bulk delete tags. That particular API can be slow and has caused issues for customers in the past. Limiting the feature to new projects was done to:
Limit the risk of the API failing for all of our customers at once
Get feedback on the feature, for example, does project-level make sense, is our default setting sensible? and are people able to find the feature?
Rollout Plan
Roll out performance improvements to the Container Registry delete API, to make them asynchronous.
Add throttling to prevent bulk delete jobs from overloading the system
Enable the feature by default for GitLab.com
Enable the feature by default for self-managed instances
Alert the user (in the app) when the policies will run
Performance gains
It's important to note that only instances utilizing the GitLab Container Registry will see the performance improvements. If you are using something like ECR, the jobs will still run, but will generally be slower.
Proposal
Roll out the Cleanup policy to all existing projects for both self-managed instances and GitLab.com. The feature should default to on and GitLab self-managed instances to allow users to lower the cost of their container registry.
By default the feature will be enabled for all projects with the below settings:
The expiration interval is set to 90 days
The expiration schedule is set to run weekly
The number of tags to retain is set to 10.
Expire images matching the regex .* after 90 days and more than ten matching images
Preserve images matching the regex .*master||.*release||release-.*||master-.*) so that master and release images are always preserved.
latest images are always preserved by default
Permissions and Security
There are no permissions changes required for this change.
This page may contain information related to upcoming products, features and functionality.
It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes.
Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
I'm not a GitLab developer, but as a fellow developer, I don't really get it: how can a new functionality to the GitLab platform end up limited to only the new projects, and existing projects are not covered.
Ah ok - it was a design decision. The container registry uses a lot of storage and relies on an API to untag images. We were concerned about scalability and so decided to include the feature on new projects, while we figure out how to scale to include all existing projects.
Thank you @alexviscreanu@julien-lecomte@ccasella@ndom91 for the feedback. I apologize for the inconvenience of the feature not supporting existing projects, but I am excited to hear that the feature will be useful once we expand support.
tldr; We are meeting this week to discuss what has to be true to roll-out support for existing projects and if we can do this for only self-managed instances first and GitLab.com later.
Background
The Expiration policy utilizes the Container Registry API to bulk delete tags. That particular API can be slow and has caused issues for customers in the past. Limiting the feature to new projects was done to:
Limit the risk of the API failing for all of our customers at once
Get feedback on the feature, for example, does project-level make sense, is our default setting sensible? and are people able to find the feature?
What's next
We will have a conversation on how to make this feature available for all projects and identify any blockers to making that possible. Once we have that conversation, I'll update this thread with more details. This is a priority for us and we will hopefully get this working for you as soon as possible.
In the meantime, you can always use the bulk delete API to delete tags and let us know if you implement this for new projects and if you have any additional feedback.
Hello, we have customers that, on on-premises installation, are asking for this, do we have any way to activate this feature for all projects on these environments?
Please add us to the list (ref). I won't reiterate what everyone else has said, but we've been looking forward to this for quite a while and it's very disappointing that it's not usable for every single situation where it would help.
I just wanted to run a crazy (dangerous?) idea as a workaround right now. In the video linked above around (3:52), @sabrams discussed how the registry cleanup feature determines whether a project is "new" (created after 12.8) or "old" (before 12.8): the database entry for projects created after 12.8 will have a "container expiration policy" field, and the assumption is that if this field exists, then the project is "new." Did I capture that correctly?
So, the crazy idea would be to manually edit an old project's database entry to include a disabled container expiration policy. At this point, the UI will show up and can be used at one's own risk. Does that make sense? Would it work?
Naturally, this workaround would only be possible in self-managed instances.
Can confirm that this works. Created a new dummy project and changed the project_id in the container_expiration_policies table to another project. The dummy project no longer has the policy and is marked as 'new', and the old project has a policy that can be changed.
sudo -u gitlab-psql /opt/gitlab/embedded/bin/psql -h /var/opt/gitlab/postgresql -d gitlabhq_productionSELECT * FROM container_repositories;SELECT id, name FROM projects;UPDATE container_expiration_policies SET project_id = X WHERE project_id = Y;
To mark all projects as eligable for tag expiration use this:
INSERT INTO container_expiration_policies (created_at, updated_at, next_run_at, project_id, name_regex, cadence, older_than, keep_n, enabled ) SELECT now(), now(), now(), id, '', '', '', null, 'f' from projects where id not in (select project_id from container_expiration_policies);
After that it should be possible to configure the expiration for every project. This will not enable the expiration by default.
The following should enable it for every project, delete everything tagged as git_... older than 7 days, keeping at least 1 image available:
INSERT INTO container_expiration_policies (created_at, updated_at, next_run_at, project_id, name_regex, cadence, older_than, keep_n, enabled ) SELECT now(), now(), now(), id, 'git_[0-9a-z]{40}', '1d', '7d', 1, 't' from projects where id not in (select project_id from container_expiration_policies);
Existing configurations are not touched by both SQL statements.
By the way if you are using docker-compose, you can use this command to get into the SQL console:
I noticed that in our on-premise installation instance table container_expiration_policies is not empty and contains projects created from 2020-01-24 (first entry), so definitely not new projects created after upgrade to version 12.8. We have 5 projects in this table, all created before upgrade to 12.8.
@trizzi Is this a bug or table served some purpose in previous versions?
@nosmo, good catch, the table itself was introduced earlier (it just didn't do anything yet). I believe the table went out in 12.7, so yes, any projects created after the table existed would have the ability to use the container_expiration_policy.
#208220 (closed) which aims to improve the performance of the delete API, is currently ~"workflow::In review". We are seeing a 94% improvement in performance. Check out !27441 (merged) for additional details on benchmarking and performance testing. Thanks @jdrpereira!
Yeah if this is really only a design decision, I can't understand not allowing admins to enable it completely for pre-existing projects.
If there are no technical road blocks, why not just make it opt-in at first? Don't disallow it entirely though..
Really looking forward to this once the kinks are worked out though because container images are killing my disk space haha
Tim Rizzichanged title from Expand Docker tag expiration and retention policies to all existing projects to Expand Docker tag expiration and retention policies to all existing projects for GitLab self-managed instances
changed title from Expand Docker tag expiration and retention policies to all existing projects to Expand Docker tag expiration and retention policies to all existing projects for GitLab self-managed instances
Tim Rizzichanged the descriptionCompare with previous version
I've previously instructed people to avoid creating large numbers of container images/tags due to the lack of automatic cleanup
I would like to enable automatic cleanup for existing/old projects so we can start using container images/tags in a more flexible way
I would be OK with needing to manually enable a server-side feature flag (e.g. setting a value in /etc/gitlab/gitlab.rb for the omnibus package) to opt-in to this feature in my environment before the other changes for performance improvements are ready.
Thanks @kepstin, this is definitely the workflow we are working towards. We have a few performance improvements to make some configuration settings to add, but we'll have this available for all projects as soon as possible.
Does this mechanism allow for the deletion of images with a certain tag upon a 'Merge Request' merge, or just based on time of upload?
In my organization's projects we use $CI_COMMIT_REF_SLUG as the tag for Docker images.
This forces us to periodically delete all images from old branches. While searching for a way to delete them automatically, I found this thread. I have no idea if this is implemented or not, nor if this is the right place to make a suggestion. My other option is to create a custom script to delete them through the API.
So i will have to continue to manually delete tags and waste my time, even when the feature to do the exact same thing is now present in our on-prem instance, because someone decided not to even bother to let me decide for myself whether i want to try that feature with knowing all the risks or not.
Wow, next time, i would consider whether to even release such half-baked feature, if users can't use it.
I like GitLab, but some decisions are hard to grasp.
Expiration and retention policies are a very important feature to us, and we will continue to iterate on it and improve it. Sorry for the inconvenience, but stay tuned!
Nice, never read such elaborate company values, ty for pointing me to this..
Back to topic, as someone else wrote here, it's about managing expectations. When feature is advertised in release info as if fully completed you can't expect that people woun't be a little disappointed, when they get the truth. It would be completely ok for me, if there was info, that this feature is rolled in incremental steps and further releases will deliver more. Just my 5 cents, ty for your hard work.
Seems like it's essentially a canary deployment. They want to test a feature in production without affecting the majority of users, and especially without impacting potentially huge code bases.
So i've set some testing container registry to give it a go.
From the posibility to set when the job is scheduled i get, that it's not part of the usuall gitlab-ctl registry-garbage-collect -m cleanup command, so my question is, when it is actually run, or more precisely as a part of what regular job is it run? Where can i find logs (or something similar) to see, if everything is running smooth and within expectations? I didn't find these informations anywhere, might be a good thing to at least mention it somewhere for us instance admins to know, where to look.
@nosmo tag expiration runs as an asynchronous job, so you should be able to see the activity in the Sidekiq logs (https://docs.gitlab.com/ee/administration/logs.html#sidekiqlog). There you should see jobs named ContainerExpirationPolicyWorker followed by CleanupContainerRepositoryWorker.
Regarding the gitlab-ctl registry-garbage-collect command, once the expiration policy workers delete tags, the blobs exclusively associated with those tags become eligible for deletion. Therefore, in the next garbage collection run, those blobs will be deleted.
Can we please update the existing documentation ASAP to state the limitation that instances that were created before version 12.8 cannot use the expiration tag feature?
We have one registry with more tag name patterns and catching all of them in one regex rule makes things not much readable. We use something like '^(naming rule 1)|(naming rule 2)|(naming rule 3)|....$'.
It would be more readable if we can define many separate rules for tag names that will work in OR fashion.
But maybe, our use case is rare and it's the 'set and forget' feature case.
@trizzi that would be nice, as of now, we are ok with what is actually implemented even if a little bit cumbersome. multiple rules with different retention policies sounds good.
when checking if everything is working as its supposed to work, i noticed, that one of our images in registry doesn't get its tags expired but all the others does. Policy is set to leave only 1 tag per image name older than one week (expiration interval). it works on all other images, but not on this one.
Could it be that all of the tags here except one have same hash & timestamp?
@nosmo@trizzi this is a limitation of the current implementation. To prevent the unintentional deletion of the latest tag, any tags that point to the same manifest (Image ID in the UI) won't be deleted. In the example above, tags 104, 105, 107 and 108 point to the same manifest as latest (875921319), so they won't be deleted. See #15737 (closed) for some historical context.
Fortunately, in 12.10 we're going to change how tags are deleted in the background (!27441 (merged)) and this will no longer be a limitation. It'll be possible to delete any tag individually, without causing the deletion of the associated manifest. We're also improving this feature performance when using the GitLab Container Registry.
A premium customer https://gitlab.my.salesforce.com/0014M00001kHnQE (GitLab internal link) named this their number 1 priority as their infrastructure team is trying to get growing costs for space usage under control. They have noticed that Container Registry is consuming a large part of the space and have updated to %12.8 in order to make use of Container Registry Expiration policy, but were unable to use it on existing projects
@dzalbo The good news is that in %12.10 we are adding a configuration option for self-managed instances to turn the expiration policies on for all existing projects. (#208735 (closed)). So, hopefully they'll be able to start using the feature more broadly sooner.
In %13.0 we will work on #208193 (closed) which will throttle the deletion process to improve scalability. And then in %13.1, in this issue, we'll turn it on by default for all projects.
We have really been waiting for this feature for a long time. Currently, we need to manually prune all projects regularly, as we are heavily relying on docker images which quickly sum up to >1TB of space.
Glad to hear, it is finally enabled for old projects as well.
But I would really like to be able to control the retention policy more fine. We create an upload images for each and every feature branch and use them for the review apps. After the review app is deployed (or at least, after the feature branch is deleted) these images are not required anymore. Thus, I would realy like to be able to define to delete all tags for these images. By now, at least 1 tag per branch is kept, which will sum up by time as well. Maybe an option to delete related images when a MR is closed / merged would be a good idea?
What if we added an option to keep 0 tags per image? We also have an issue open to expand the policies to the repository (image) level. #37242
Yes, the "keep 0"-Option is definitely needed. But not sufficient. I think, what we need should be more like multiple retention rules and / or #26983.
Each feature-branch creates a new docker repository in our case and these can be purged as soon as the feature is merged. Additionally, from the maintenance branches (yes, multiple in parallel) we would like to keep e.g. the latest image.
@trizzi Any chance to disable/remove this annonying message?
It is shown in Container registry in every project, and well, it is really just occupying screen space and doing nothing usable. I didn't find any setting to disable it, nor there is some [x] to close it definitely.
Instead of this, i would prefer to see more image names in list bellow.
I know there are users who need to be informed, but even those users after getting the message will be annoyed based on my experience.
We have a concept of 'cookie/locally' saved 'closed' alerts, so we could display it once and then if the users want to 'x' it away is stored in the session cookie / local storage and the same alert for the same project is not shown anymore.
If we would want to have this 'closed state' persist across logins/computers/browsers we would need to add it as a value in the DB, in this case, tho, I would aim towards 'collapsing the message' and not make it disappear completely
Another option is to have a settings option that disables the banner
Maybe we can add the cookie/locally saved alert as an MVC?
The question I have is, when is the best time to be informed? Do people need to know that it's going to run in 36 hours? Or should we only alert them at 8 hours, 2 hours, and 5 minutes?
For reference, the updated Container UI contains a permanent moment in the interface showing users when the expiration policy for that registry will execute next. You can view that issue here: #216749 (closed)
Maybe we can add the cookie/locally saved alert as an MVC?
I think this is a good MVC solution as well. As the expiration policies become more familiar to users, this alert will eventually be removed, or possibly changed to a 1-time alert informing the user that this is happening.
The question I have is, when is the best time to be informed? Do people need to know that it's going to run in 36 hours? Or should we only alert them at 8 hours, 2 hours, and 5 minutes?
Because users only see this alert when they visit the UI (instead of receiving a To Do or email alert), we may want to think of a longer timeline. Maybe 3 days in advance of the policy running, then giving the user an option to dismiss the alert after they've seen it?
Tim Rizzichanged title from Expand Docker tag expiration and retention policies to all existing projects for GitLab self-managed instances to Enable Image expiration policies for all existing projects
changed title from Expand Docker tag expiration and retention policies to all existing projects for GitLab self-managed instances to Enable Image expiration policies for all existing projects
Tim Rizzichanged the descriptionCompare with previous version
@ccasella I'm not sure if the user knows, but they are (or their Admin) able to configure GitLab to enable this feature for all existing projects. They can do this via application settings. This was released in 12.10 via #208735 (closed)
Thanks for the info - just enabled it and works great. I wonder why this is not enabled by default for Omnibus instances on 13.1 and later. Would have saved some time googling this and then issuing the REST commands. For anyone struggling with this as well, the command is:
curl --request PUT --header "PRIVATE-TOKEN: YOUR_API_TOKEN" "https://YOUR_GITLAB_DOMAIN/api/v4/application/settings?container_expiration_policies_enable_historic_entries=true"
@makarovdenis11 The feature is available at the project level, including for free, private instances. This was done to help give more granular control over the registry. If you need to define the policy for multiple projects, I recommend using the API
@makarovdenis11 I will chime in here! That message is referring to the fact that the project in question was created before GitLab version 12.8 and thus the expiration policy is disabled, we recently released the ability to enable it for every project but it needs to be enabled in the instance admin.
maybe we need to iterate on the message if it causes confusion?
@nmezzopera What do you mean by instance admin? I use cloud-based gitlab on gitlab.com and don't have separate own gitlab instance on my server. So can you send docs link where feature activation process described?
Tim Rizzichanged title from Enable Image expiration policies for all existing projects to Enable Tag Cleanup Policies for all existing projects
changed title from Enable Image expiration policies for all existing projects to Enable Tag Cleanup Policies for all existing projects
Tim Rizzichanged the descriptionCompare with previous version
changed the description
Tim Rizzichanged title from Enable Tag Cleanup Policies for all existing projects to Enable Cleanup policy for tags for all existing projects
changed title from Enable Tag Cleanup Policies for all existing projects to Enable Cleanup policy for tags for all existing projects
Tim Rizzichanged the descriptionCompare with previous version
changed the description
Tim Rizzichanged title from Enable Cleanup policy for tags for all existing projects to Enable Cleanup policy for tags for all existing projects on GitLab.com
changed title from Enable Cleanup policy for tags for all existing projects to Enable Cleanup policy for tags for all existing projects on GitLab.com
@trizzi we may need to consider a phased rollout for this. The cleanup policies are currently not available for projects created before GitLab 12.8, and based on #208193 (comment 363465688), it looks like we have ~420K repositories in that condition. If we enable this for all of them simultaneously, we may put the Container Registry and the GitLab works under a severe load.
Even with throttling enabled, I think it would be wise to approach the rollout in batches, for example, enabling for all projects created since 12.7 first, then since 12.6, and so on, until they're all enabled. This would not only enable us to isolate possible issues, but we could also learn from each one of the phases and adapt the rollout approach accordingly (e.g. increasing or decreasing the target releases and/or adapt the throttling settings accordingly).
We should also make sure that the cleanup policies are not enabled by default for existing projects during this rollout, as we have some projects where we might not want to expire any tags for historic reasons (e.g. gitlab.com/gitlab-org/build/CNG).
UPDATE: This issue has to be pushed back to %13.4. We are still working on #208193 (closed), which will address performance and scalability concerns by throttling the number of delete requests that can be enqueued.
Improving this feature remains a top priority for us. Please stay tuned.
@trizzi I see this issue title is for just gitlab.com now but description says for self-managed? Is there a separate issue to get this rolled out as default for self-managed?
@bbodenmiller Thanks Ben for reaching out. I updated the title/description to make it clear that we need to enable this feature by default for both .com and self-managed instances.
In the meantime, for your self-managed instance you can already enable this feature for all projects. There is an application setting https://docs.gitlab.com/ee/api/settings.html#change-application-settings that you can adjust that will allow you to turn the feature on. But, agreed it should be on by default.
One concern that we have is that we will auto-enable the feature for a self-managed customer that is using an external registry like ECS and the performance is poor. We are working on a separate issue that will help us to understand how common that is. But, I think we can enable it by default and it can always be turned off.
@trizzi thanks for the note. I saw the setting to enable for self-managed instances but wondered if perhaps it is unsafe for some instances to do so for similar reasons as why it's not enabled for .com?
@bbodenmiller The issue with .com is just the sheer scale. We have so many projects/images that could be deleted.
There are still performance concerns for self-managed. But the bigger concern is that a self-managed instance may be using a 3rd party registry, which doesn't have a lot of the performance improvements that we've been making to Docker Distribution Registry code. If it's urgent, you could try. If not, we will hopefully be ready to roll the feature out for all projects in 13.4, pending testing.
Tim Rizzichanged title from Enable Cleanup policy for tags for all existing projects on GitLab.com to Enable Cleanup policy for tags for all existing projects on both GitLab.com and self-managed instances
changed title from Enable Cleanup policy for tags for all existing projects on GitLab.com to Enable Cleanup policy for tags for all existing projects on both GitLab.com and self-managed instances
Tim Rizzichanged the descriptionCompare with previous version
UPDATE: I wanted to provide an updated status on this issue:
We are working on throttling to address some performance concerns in 13.3. (#208193 (closed))
Once that's done, we will begin testing with a few different GitLab.com customers. If you are interested in beta testing this feature for all of your historical .com projects, let me know.
When we are satisfied with the performance and scalability, we will roll the feature out for everyone.
Separately, we have been seeing several issues reported that tags are not being deleted, despite the policy being enabled. We are a bit short-handed at the moment, but we are hoping to dive into these issues in the coming weeks. Thank you for your patience as we work through the following issues:
Finally, we've been working on a new design of the Cleanup policy UI and would love to schedule a few user interviews to ensure that designs make sense and are an improvement over what we have today.
If you'd like to participate, you can send me an email at trizzi@gitlab.com. We are looking to interview 5-8 different people.
Thank you @tylerr92! We are still working through the throttling issue, but we'll reach out within a couple of weeks to discuss. Looking forward to it!
UPDATE: We are making good progress on throttling and testing the policies. We are hoping to start a controlled rollout of this feature in 13.5. We'll keep you updated with our testing results here. Thank you everyone!
UPDATE: We are making good progress on adding throttling to the Cleanup policies and will begin a phased roll-out of the feature for historical projects in %13.5.
However, I don't think we'll be ready to release it for all namespaces/projects during that time. So this issue will slip to 13.6 and we'll begin #244050 (closed) in 13.5.
@kencjohnston@jyavorska This issue will enable the Cleanup policies for all historical projects by default for GitLab.com and Self-Managed instances.
The default regex will be:
Expire images matching the regex .* after 90 days and more than ten matching images
Preserve images matching the regex .*master||.*release||release-.*||master-.*) so that master and release images are always preserved.
latest images are always preserved by default
We'll only do this after carefully following a phased rollout of the feature and starting with internal GitLab projects first to test the performance and functionality.
Can tags for <version core> patterns (not <valid semver>) as defined on semver.org be included in the default set? So, something like \d+\.\d+\.\d+? I think they should be considered like the release tags.
@sabrams Some extra considerations for !44757 (merged). It seems like we may end up having a fairly long default list of regex. We'll need to think about how to communicate that in the UI. Maybe just link to the docs and have a note that communicates that we will not include any image that contains release or master or follow a valid semver.
@trizzi, was preserving *master and *release a request from users? I wonder if the default should be preserving only tags that match a SemVer version, as requested in this thread.
For example, I'm afraid that *master would skip tags built with every commit, which should be erased in most cases IMO. If we only keep SemVer tags by default we could use a single regexp (source):
I'd argue that only <version core> should be kept as a default, and users should explicitly choose to keep alpha, beta, prerelease and other tags (i.e., provide documentation for the full pattern). That cuts down the regexp to:
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)$
As for *master and *release, I also question their utility. But, preserving latest would be good since that's the implicit image tag if no tag is specified. If users make use of that, it should be preserved along with the semver tags.
Thank you for the feedback @kinghuang and @jdrpereira Your suggestions make sense to me, thank you for clarifying that we should default to preserving only tags that match a SemVer version.
@trizzi I just want to make sure the language doesn't cause any confusion:
This issue will enable the Cleanup policies for all historical projects by default for GitLab.com and Self-Managed instances.
This issue will allow cleanup policies for all historical projects on GitLab.com and Self-Managed instances to have the ability to be enabled, but users will have to enable them if they want to use them. It will not enable any cleanup policies for existing projects as that would delete image tags without users first agreeing to enable the policy.
Some extra considerations for !44757 (merged). It seems like we may end up having a fairly long default list of regex. We'll need to think about how to communicate that in the UI.
The default regexes will show up in the cleanup policy form. I think the main thing to communicate is on a new project users will see the policy is enabled and be guided to visit the settings so they can change the defaults before creating substantial amounts of images.
@twk3 Do you have an opinion on the default logic for which images to always preserve?
As the default, either is fine with me, but version core is likely easier for users to understand whether they should stick with the default or change it.
My only additional suggestion would be support for the the v prefix. Which is going to be fairly common if users are having their gitlab ci push images to the registry during git tag pipelines.
One more clarification about the regexes, when it comes to the default retention regex, it will only be automatically applied on new projects, and the setting is completely configurable. So when existing projects get the ability to turn on cleanup policies, you will see the default regex in the form, but can change it to whatever you would like before enabling the policy.
I've also shared some additional feature ideas around the cleanup policy regex settings and form UI in #263775.
@trizzi Are you planning to make it configuable, which defaults apply to projects in a group?
In our usecase, we would like all our projects to have a specific default cleanup policy, such that we don't have to remember and set this up for each new project we create in our orginasation.
I believe that the frontend work around this is minimal so this has frontend-weight1 not adding the weight on the issue because I believe the backend weight will be different
UPDATE: Unfortunately this is going to slip a bit further. It turns out that the logic for introducing throttling required many small changes prior to actually throttling the workers.
If you are interested in beta testing the feature, you can simply provide your project path in the issue #244050 (closed) or by emailing me at trizzi@gitlab.com.
The testing process will basically include:
Project owner share your project path with the Package team. Example gitlab-org/gitlab
Package team: enable the feature flag container_expiration_policies_historic_entry for the given project.
Package team -> Project owner: Communicate that the cleanup policy form is now unlocked. (located at #project_path#/-/settings/ci_cd)
Project owner: Setup the policy attribute and enable it.
Project owner -> Package team: Communicate the next run of the cleanup policy
Package team: at the given execution time, monitor how production reacts
Update: As we've been rolling out the policy for historical projects, we've found some performance improvements that need to be made to accommodate large container image repositories. In 13.8 we'll work on #288812 (closed) and hope to address this issue in 13.9/10
**Update: ** We are going to have to delay enabling the policies for all historical projects on GitLab.com for a few milestones. We've made several performance improvements to the feature, but unfortunately, we do not feel comfortable enabling this feature for all projects.
The primary reason for this is that in Docker distribution registry, the image manifests are stored in object storage, which requires pinging the Docker API for each tag in an image repository. Even after breaking the work into smaller chunks, this doesn't scale well for large repositories.
The solution for this will be updating GitLab's fork of Docker to store image manifests in Postgres instead of object storage. This will make finding the list of tags to be deleted much easier.
If your organization needs this feature turned on, reach out to me (trizzi@gitlab.com) and we can enable the feature for your workspace. For Self-Managed customers, you can already enable the feature for all projects in GitLab application settings by setting container_expiration_policies_enable_historic_entries to true.
@chloe If they are on Self-Managed, they can do this now by adjusting their GitLab settings. For SaaS, we are beginning the percentage-based rollout for historical projects in 14.0. Although this process may take a few months.
In the meantime, if the customer (or anyone else in this issue) is interested, we can manually enable the feature for them on .com. Just have them send me an email or a confidential issue listing the projectIDs they'd like the policies turned on for.
@trizzi Is this something that requires any additional actions? We have a self-hosted instance on 14.0, and I enabled the "Enable container expiration and retention policies for projects created earlier than GitLab 12.7." checkbox in the admin area. However, on the project I am testing this feature, the number of tags is not decreasing, although I am sure, that there are tags old enough to be removed, that fall into the removal regex, and don't fall into the keep regex.
Are there any logs that I could check to verify whether or not the tag removal is actually running for the project?
We're using Helm Chart installation, with Minio as object storage (also for the registry).
@rgembalik One common issue we see is that the cleanup policy contains a .* in the field for images to delete. But that is just example text. You have to explicitly type .* and save the policy for it to work. Although I thought we resolved that by 14.0.
So the fields for regex are both filled. Both are essentially several or patterns.
The one for removal is like feature-.*|bugfix-.*|hotfix-.* and the one for keeping is something like master|test-.*|develop (they are a bit more specific to our project, but this is a general idea).
I've tested them so far in a form of a-.*|b-.*, (a|b)-.*, (a-.*|b-.*) and now I think I am waiting for results for (?:a-.*|b-.*) (this one because I fond similar example in the docs). I am changing both of them at the time in the same way but using different parenthesis combinations.
I'll test the manual removal code in the cli on Monday and let you know about the results.
@trizzi The latest version of the query (using anonymous groups) did work during scheduled removal (without manual commands in CLI), or at least I can see the tag count reduced. It's a bit confusing because the limitation in using groups is not mentioned in the documentation and I am not sure if that's something I messed up earlier, or if that's the actual limits. I think that will partially be fixed by #223732 (closed), but even then, such specificities of the regex pattern should be explicitly mentioned in the docs.
So it worked with anonymous groups but not without?
Yes, if I remember correctly the last change was only adding the ?: to the group. I did it only because I saw it in the docs as an example. I am not 100% sure that was the problem, because once it worked I no longer had images to test it on. I will try to pay closer attention to it, and if it happens again, I'll create (or look for) a separate ticket directly about the regex.
Update: We've been enabling the policies for historical projects on GitLab.com. Currently we've enabled the feature for 5% of projects. We plan to follow the schedule in the epic: &6405 (closed), which is pictured below:
We are hoping for 100% coverage by milestone 14.5.
For the last few days, we observed the background jobs struggling keeping up with the increased load we're seeing in the cleanups.
We tried to bump the number of parallel jobs allowed for cleanups (from 5 to 7) and we didn't get the expected impact. The jobs are still struggling with the load.
In more details, all container images receives one cleanup when the policy is triggered. For container images that have really long cleanups (due to a long list of tags), we saw a sharp decrease in the number of cleanups they receive daily. Note that the system is behaving properly, those cleanups that are "resumed" are put to a side when there are some cleanups from a policy being triggered. In other words, the two priorities (high and low) are working as expected. We simply observe an increased load in the low priority and so the backend can't keep up with that.
Implement some cache in cleanup jobs to have less container registry API calls and thus make the job more efficient. Issue #339129 (closed).
We're going to pause the rollout (currently at 20% of the projects + 3 batches of selected projects) of this feature to implement (2.). We will resume when caching is on production and we have the intended effects.
Link to request: GitLab team members may click the linked mention to view additional detail for this request within the customer's collaboration project.
@10io the linked docs still say "For self-managed GitLab instances, the project must have been created in GitLab 12.8 or later" so either the docs are wrong or this isn't fully addressed.
We have an application setting called container_expiration_policies_enable_historic_entries that controls if cleanup policies are for all projects (setting enabled) or only post 12.8 projects (setting disabled).
By default, this setting is disabled. So yes, by default, project must have been created in GitLab 12.8 or later. self-managed admins can choose to remove that restriction.
We could think in exposing this setting in the admin part of the cleanup policies (Menu > Admin > Settings > CI/CD > Container Registry).
@10io I'd still argue that does not fulfill the original intent of this issue - any manual configuration required by admins is unlikely to be enabled in most cases. The performance issues seem to have to do with external registries, is there not a way to enable this for local registries only?