Improve cleanup policies selection during their execution
🌲 Context
Cleanup policies are executed on a regular basis by a cron job. During this execution, the cron job selects the cleanup policies that need to be executed using the .runnable
scope.
The issue is that this scope selects a far too big set (700K+ rows).
Why do we have such a high set, it's because cleanup policies are created when a new project is created and by default the policy is enabled = it will be picked up by the cron job for execution.
🔍 What does this MR do?
As explained in #263110 (closed), we're going to reduce the size of the selected scope by selecting only policies that are actually linked to container repositories. Policies with no container repositories don't need to be executed as their execution is a no-op.
This MR will:
- add a new
.executable
scope in theContainerExpirationPolicy
- this scope will select policies that are
runnable_schedules
and have at least one container repository.
- this scope will select policies that are
- add
.each_bactch
support toContainerExpirationPolicy
so that we can iterate on a large set more efficiently - add a new index to improve the ~performance of
.executable.each_batch
- update the relevant specs
💽 Database considerations
Before this MR, the cron worker was using .find_each
to loop on the policies set. As stated in the documentation:
NOTE: It's not possible to set the order. That is automatically set to ascending on the primary key (“id ASC”) to make the batch ordering work. This also means that this method only works when the primary key is orderable (e.g. an integer or string).
So a order by policy_id
will be imposed and here an example of the resulting query:
SELECT "container_expiration_policies".*
FROM "container_expiration_policies"
WHERE "container_expiration_policies"."enabled" = TRUE
AND (next_run_at < '2020-10-09 08:40:06.780007')
ORDER BY "container_expiration_policies"."project_id" ASC
LIMIT 1000;
Here is the explain plan: https://explain.depesz.com/s/XxqE . We can notice that the index index_container_expiration_policies_on_next_run_at_and_enabled
is not used at all. Thanks to @sabrams explanations ORDER BY "container_expiration_policies"."project_id" ASC
imposed by .find_each
To fix that we're going to add a new index on columns project_id
, next_run_at
and enabled
.
This MR will modify the query to include a check for ContainerRepository
existence and so we end up with a query like this one:
SELECT "container_expiration_policies"."project_id"
FROM "container_expiration_policies"
WHERE "container_expiration_policies"."enabled" = TRUE
AND (next_run_at < '2020-10-09 09:00:45.082074')
AND (EXISTS
(SELECT 1
FROM "container_repositories"
WHERE (container_repositories.project_id = container_expiration_policies.project_id)))
ORDER BY "container_expiration_policies"."project_id" ASC
LIMIT 1000
Here is the explain plan: https://explain.depesz.com/s/bUbP. This time, we can see that the new index based on the 3 columns is properly used.
🔼 Migration up
== 20201009090954 AddIndexWithProjectIdToContainerExpirationPolicies: migrating
-- transaction_open?()
-> 0.0000s
-- index_exists?(:container_expiration_policies, [:project_id, :next_run_at, :enabled], {:name=>"idx_container_exp_policies_on_project_id_next_run_at_enabled", :algorithm=>:concurrently})
-> 0.0020s
-- execute("SET statement_timeout TO 0")
-> 0.0002s
-- add_index(:container_expiration_policies, [:project_id, :next_run_at, :enabled], {:name=>"idx_container_exp_policies_on_project_id_next_run_at_enabled", :algorithm=>:concurrently})
-> 0.0030s
-- execute("RESET ALL")
-> 0.0002s
== 20201009090954 AddIndexWithProjectIdToContainerExpirationPolicies: migrated (0.0057s)
🔽 Migration down
== 20201009090954 AddIndexWithProjectIdToContainerExpirationPolicies: reverting
-- transaction_open?()
-> 0.0000s
-- indexes(:container_expiration_policies)
-> 0.0030s
-- execute("SET statement_timeout TO 0")
-> 0.0002s
-- remove_index(:container_expiration_policies, {:algorithm=>:concurrently, :name=>"idx_container_exp_policies_on_project_id_next_run_at_enabled"})
-> 0.0017s
-- execute("RESET ALL")
-> 0.0002s
== 20201009090954 AddIndexWithProjectIdToContainerExpirationPolicies: reverted (0.0054s)
Screenshots
n / a
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team