Skip to content

Add background jobs for cleanup policies for packages

David Fernandez requested to merge 346153-background-job into master

Context

The Packages Registry works with these core models (simplified):

flowchart LR
    Group -- 1:n --> Project
    Project -- 1:n --> Package
    Package -- 1:n --> PackageFile
    PackageFile -- 1:1 --> os((Object Storage file))

For some package formats, we allow republishing a package version. What happens is that we append the package files to the existing package.

With time, some packages can end up with many package files. All these package files take space on Object Storage.

With #346153 (closed), we're kicking off the work for adding cleanup policies for packages. Basically, users will be allowed to define cleanup rules and the backend will regularly execute the policy to remove packages/package files that are not kept by the policy.

In true iteration spirit, the first iteration will have a single rule. Users will be able to define how many duplicated package files (by filename) need to be kept.

Example: for maven package, a pom.xml is uploaded on each publication. If you publish the same version 100 times, you end up with 100 pom.xml package files. Users will be able to state that they only want to keep the 10 most recent pom.xml files.

For this feature, there are several backend parts:

  1. The policy model. That's !85918 (merged).
  2. Expose the policy object through GraphQL. That's !87799 (merged).
  3. The execute policy service. That's !90395 (merged).
  4. The background job that executes the cleanup policies (through the service added in (3.)). 👈 This MR

This is issue #346153 (closed).

As stated above, this MR focuses on introducing all the background worker changes that will execute policies. The execution itself is handled by a service that was introduced with !90395 (merged).

Here, the goal is to collect the policies that need to be run and execute them one by one. For this aspect, we're going to leverage the LimitedCapacity::Worker concern. Basically, as time passes by a backlog of policies that needs to be executed is created (each policy has a next_run_at column). This backlog is processed by a number of concurrent jobs that loop on themselves until the backlog is empty. The number of concurrent jobs is a new application setting so that we can fine tune the pressure on Sidekiq and self-managed admins can tweak this number to their setup.

This is all great but we need a way to kickstart the "loop" of self enqueueing jobs. For this, we're going to use a cron job that regularly check if there are some policies to execute or not. If they are some, it will enqueue the limited capacity job that will start the "loop".

🔬 What does this MR do and why?

  • Introduce the Packages::Cleanup::ExecutePolicyWorker as a LimitedCapacity::Worker.
  • The capacity for this worker is set by an newly introduced application setting: package_registry_cleanup_policies_worker_capacity.
    • Expose this application setting on the usual endpoint.
  • That worker will use the existing Packages::Cleanup::ExecuteService.
  • Update the Packages::CleanupPackageRegistryWorker worker to kickoff Packages::Cleanup::ExecutePolicyWorker if necessary.
    • Also this parent worker will dump metrics on policies (how many are runnable).
  • Update the cleanup policy model to support the background worker.
  • Add the relevant database index to get the runnable cleanup policies.
  • Update the relevant specs.

🖼 Screenshots or screen recordings

n / a

📐 How to set up and validate locally

We're going to create a bunch of dummy packages with duplicates packages. We will then create a packages cleanup policy to keep only 1 duplicated package file = only the most recent one will be kept.

We don't want to wait for the next_run_at of the policy to be executable, so we will modify it to make the policy runnable.

Finally, we will run the cron job that will kick off the limited capacity job. That job will execute our policy and mark for destruction the intended package files.

Let's get started. In a rails console:

  1. Follow this to define a fixture_file_upload function.
  2. Let's create 3 packages:
    project = Project.first
    pkg1 = FactoryBot.create(:generic_package, project: project)
    pkg2 = FactoryBot.create(:generic_package, project: project)
    pkg3 = FactoryBot.create(:generic_package, project: project)
  3. Let's add some dummy files:
    FactoryBot.create(:package_file, :generic, package: pkg1, file_name: 'file_for_pkg1.txt')
    2.times { FactoryBot.create(:package_file, :generic, package: pkg2, file_name: 'file_for_pkg2.txt') }
    3.times { FactoryBot.create(:package_file, :generic, package: pkg3, file_name: 'file_for_pkg3.txt') }
  4. Check the created files (check the status column)
    pkg1.reload.package_files
    pkg2.reload.package_files
    pkg3.reload.package_files
  5. Create the packages cleanup policy that will keep only 1 duplicated package files:
    project.packages_cleanup_policy.update!(keep_n_duplicated_package_files: '1')
    policy = project.packages_cleanup_policy
  6. Make the policy runnable. We need to use update_column as there is a callback on save that updates the next_run_at for the next execution. We need to avoid executing that callback.
    policy.update_column(:next_run_at, 2.minutes.ago)
  7. Run the cron job
    Packages::CleanupPackageRegistryWorker.new.perform
  8. Let's re inspect files:
    pkg1.reload.package_files
    pkg2.reload.package_files
    pkg3.reload.package_files
  9. The most recent package file has status: 'default' and all the others has status: 'pending_destruction'. That's the expected behavior

🚦 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

💾 Database review

Migration up

main: == 20220713175658 AddPackagesCleanupPoliciesWorkerCapacityToApplicationSettings: migrating 
main: -- add_column(:application_settings, :package_registry_cleanup_policies_worker_capacity, :integer, {:default=>2, :null=>false})
main:    -> 0.0042s
main: == 20220713175658 AddPackagesCleanupPoliciesWorkerCapacityToApplicationSettings: migrated (0.0050s) 

main: == 20220713175737 AddApplicationSettingsPackagesCleanupPoliciesWorkerCapacityConstraint: migrating 
main: -- transaction_open?()
main:    -> 0.0000s
main: -- current_schema()
main:    -> 0.0009s
main: -- transaction_open?()
main:    -> 0.0000s
main: -- execute("ALTER TABLE application_settings\nADD CONSTRAINT app_settings_pkg_registry_cleanup_pol_worker_capacity_gte_zero\nCHECK ( package_registry_cleanup_policies_worker_capacity >= 0 )\nNOT VALID;\n")
main:    -> 0.0022s
main: -- current_schema()
main:    -> 0.0003s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0003s
main: -- execute("ALTER TABLE application_settings VALIDATE CONSTRAINT app_settings_pkg_registry_cleanup_pol_worker_capacity_gte_zero;")
main:    -> 0.0013s
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
main: == 20220713175737 AddApplicationSettingsPackagesCleanupPoliciesWorkerCapacityConstraint: migrated (0.0193s) 

main: == 20220713175812 AddEnabledPoliciesIndexToPackagesCleanupPolicies: migrating =
main: -- transaction_open?()
main:    -> 0.0000s
main: -- index_exists?(:packages_cleanup_policies, [:next_run_at, :project_id], {:where=>"keep_n_duplicated_package_files <> 'all'", :name=>"idx_enabled_pkgs_cleanup_policies_on_next_run_at_project_id", :algorithm=>:concurrently})
main:    -> 0.0025s
main: -- add_index(:packages_cleanup_policies, [:next_run_at, :project_id], {:where=>"keep_n_duplicated_package_files <> 'all'", :name=>"idx_enabled_pkgs_cleanup_policies_on_next_run_at_project_id", :algorithm=>:concurrently})
main:    -> 0.0029s
main: == 20220713175812 AddEnabledPoliciesIndexToPackagesCleanupPolicies: migrated (0.0123s) 

Migration down

main: == 20220713175812 AddEnabledPoliciesIndexToPackagesCleanupPolicies: reverting =
main: -- transaction_open?()
main:    -> 0.0000s
main: -- indexes(:packages_cleanup_policies)
main:    -> 0.0052s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0003s
main: -- remove_index(:packages_cleanup_policies, {:algorithm=>:concurrently, :name=>"idx_enabled_pkgs_cleanup_policies_on_next_run_at_project_id"})
main:    -> 0.0035s
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
main: == 20220713175812 AddEnabledPoliciesIndexToPackagesCleanupPolicies: reverted (0.0167s) 

main: == 20220713175737 AddApplicationSettingsPackagesCleanupPoliciesWorkerCapacityConstraint: reverting 
main: -- transaction_open?()
main:    -> 0.0000s
main: -- transaction_open?()
main:    -> 0.0000s
main: -- execute("ALTER TABLE application_settings\nDROP CONSTRAINT IF EXISTS app_settings_pkg_registry_cleanup_pol_worker_capacity_gte_zero\n")
main:    -> 0.0018s
main: == 20220713175737 AddApplicationSettingsPackagesCleanupPoliciesWorkerCapacityConstraint: reverted (0.0121s) 

main: == 20220713175658 AddPackagesCleanupPoliciesWorkerCapacityToApplicationSettings: reverting 
main: -- remove_column(:application_settings, :package_registry_cleanup_policies_worker_capacity, :integer, {:default=>2, :null=>false})
main:    -> 0.0035s
main: == 20220713175658 AddPackagesCleanupPoliciesWorkerCapacityToApplicationSettings: reverted (0.0056s) 

📊 Queries

  1. Runnable policy, limited count
  2. First runnable policy
  3. Runnable policy existence
  4. Runnable policy "full" count
Edited by David Fernandez

Merge request reports