Disable cleanup policies linked to no container repositories
🔥 Problem
At the time of this writing, on gitlab.com we have the following situation:
- ~3 millions (internal) cleanup policies that are enabled but the linked project has not container images
- ~6000(internal) cleanup policies that are enabled and the linked project has at least one container image
In a upcoming work, we're going to add an index that specifically targets enabled cleanup policies. This is to help the background workers to scan through them.
The second set of cleanup policies is really what the background workers are interested in. By having that heavy first set of unrelated policies, the index will not be working properly as background workers will only use 0.2% of the data in the index.
🚒 Solution
We can't build the index in such way that it selects only the enabled policies that are linked to at least on container image.
The only solution is thus, disabling those cleanup policies (the ones from the first set). By doing so, they will be removed from the index, the amount of data hold in the index will be heavily reduced and we expect to have a much more efficient index.
There are two ways to disable those policies:
- A post migration. This is a one shot solution as a post migration is executed only when it is deployed and that's it.
- Add this as a step on the cron job of the cleanup policies. This way, each time the job is executed, we make sure that we don't have the first set.
- This needs to be properly document because it means that users deleting container images could disable the associated cleanup policy if they delete all container images.
⛑ Comparison
Let's image that we want to build a query like this one
WITH policies AS (
SELECT "container_expiration_policies"."project_id"
FROM "container_expiration_policies"
WHERE "container_expiration_policies"."enabled" = TRUE
AND "container_expiration_policies"."next_run_at" < NOW()
)
SELECT container_repositories.id
FROM container_repositories
WHERE "container_repositories"."project_id" IN (
SELECT "policies"."project_id"
FROM policies
)
(We're pulling all the container image ids that are like to enabled and runnable cleanup policies)
Here is what we get the index holding both sets: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3947/commands/13576 (internal). Using the index is not helpful, we get 14s
of execution time.
Here is what we have with the index holding only the second set: https://console.postgres.ai/gitlab/gitlab-production-tunnel/sessions/3948/commands/13588 (internal). 20ms of execution time. So
📻 Other considerations
We're disabling the cleanup policies on the users behalf. This change should be properly communicated.
🔨 SQL UPDATE
UPDATE container_expiration_policies SET enabled=FALSE WHERE "container_expiration_policies"."enabled" = TRUE and NOT (EXISTS (SELECT 1 FROM "container_repositories" WHERE (container_repositories.project_id = container_expiration_policies.project_id)))
Breaking change for 14.0
The 14.0 breaking change blog post has been updated to include a note for this change: gitlab-com/www-gitlab-com!79292 (closed)