Skip to content

Add package file cleanup jobs

What does this MR do and why?

This is MR 2 of #345755 (closed), in which we are adding a scalable way to delete package files.

In MR 1 (!76767 (merged)) we added the PackageFile#status enum attribute to allow us to mark package files to be destroyed asynchronously with a pending_destruction status. For more context please look at the comprehensive description in that MR.

This second MR builds upon that work and:

  1. Adds the processing and error values to the status enum, which are useful to track the async destruction status in the limited capacity cleanup worker (explained further below).
  2. Adds a cron job and a limited capacity worker to delete package files marked for destruction.
    1. The CleanupPackageRegistryWorker cron job will run twice a day. It will check if there are any package files pending destruction, in which case it will enqueue the limited capacity worker.
    2. The CleanupPackageFileWorker limited capacity worker will pick up and destroy package files that are pending destruction one by one until all have been deleted. A couple important things with this worker are:
      1. We are using the new processing and error package file statuses for monitoring and more robust processing and error handling: if a package file fails to be destroyed for some reason, we mark it with the error status and move on.
      2. We are logging some statistics so that we can build a Kibana dashboard to monitor the following things at any time:
        • How many package files are pending destruction.
        • How many package files are being processed for destruction.
        • How many package files have failed to be destroyed.

It's worth noting that these new jobs are heavily based on the Dependency Proxy cleanup workers, which are already… working in Production. So, this should provide some extra confidence in this implementation 👍🏼 .

It's also worth mentioning a couple improvements that may be follow-up issues:

  1. We could re-use the existing logic in the DependencyProxy::CleanupWorker concern instead of duplicating most of it for the CleanupPackageFileWorker. I've added a CONSIDER code comment about this. Update: Addressing backend review feedback, I've now done this refactoring here .
  2. We could add a way to handle package files left with an error status. Right now, this is also something pending for the Dependency Proxy cleanup workers.

Screenshots or screen recordings

NA

How to set up and validate locally

  1. Set up a project to use its Package Registry.
  2. Publish a bunch of generic package files:
    echo 'destroy all the things' > destroy.txt
    curl --header "PRIVATE-TOKEN: <PAT>" --upload-file ./destroy.txt "http://gdk.test:3000/api/v4/projects/<id>/packages/generic/destroy/0.0.1/destroy-1.txt"
    curl --header "PRIVATE-TOKEN: <PAT>" --upload-file ./destroy.txt "http://gdk.test:3000/api/v4/projects/<id>/packages/generic/destroy/0.0.1/destroy-2.txt"
    curl --header "PRIVATE-TOKEN: <PAT>" --upload-file ./destroy.txt "http://gdk.test:3000/api/v4/projects/<id>/packages/generic/destroy/0.0.1/destroy-3.txt"
    curl --header "PRIVATE-TOKEN: <PAT>" --upload-file ./destroy.txt "http://gdk.test:3000/api/v4/projects/<id>/packages/generic/destroy/0.0.1/destroy-4.txt"
    curl --header "PRIVATE-TOKEN: <PAT>" --upload-file ./destroy.txt "http://gdk.test:3000/api/v4/projects/<id>/packages/generic/destroy/0.0.1/destroy-5.txt"
  3. In a Rails console session, mark the package files as pending_destruction and run the clean up cron job. The cron job will enqueue the limited capacity job to delete all package files. When using GDK, the rails-background-jobs should pick up and run the enqueued job (this can be monitored locally with gdk tail rails-background-jobs). After the enqueued job is run, you can confirm that the package files have been destroyed.
    package_files = Packages::Package.generic.with_name('destroy').with_version('0.0.1').last.package_files
    package_files.count
    # => 5
    package_files.update_all(status: :pending_destruction)
    # => 5
    Packages::CleanupPackageRegistryWorker.new.perform
    package_files.count
    # => 0

Database review

Migration up

== 20211224112937 AddPackagesCleanupPackageFileWorkerCapacityToApplicationSettings: migrating 
-- add_column(:application_settings, :packages_cleanup_package_file_worker_capacity, :smallint, {:default=>2, :null=>false})
   -> 0.0028s
== 20211224112937 AddPackagesCleanupPackageFileWorkerCapacityToApplicationSettings: migrated (0.0029s) 
== 20211224114539 AddPackagesCleanupPackageFileWorkerCapacityCheckConstraintToAppSettings: migrating 
-- transaction_open?()
   -> 0.0000s
-- current_schema()
   -> 0.0003s
-- transaction_open?()
   -> 0.0000s
-- execute("ALTER TABLE application_settings\nADD CONSTRAINT app_settings_p_cleanup_package_file_worker_capacity_positive\nCHECK ( packages_cleanup_package_file_worker_capacity >= 0 )\nNOT VALID;\n")
   -> 0.0010s
-- current_schema()
   -> 0.0003s
-- execute("SET statement_timeout TO 0")
   -> 0.0003s
-- execute("ALTER TABLE application_settings VALIDATE CONSTRAINT app_settings_p_cleanup_package_file_worker_capacity_positive;")
   -> 0.0008s
-- execute("RESET statement_timeout")
   -> 0.0003s
== 20211224114539 AddPackagesCleanupPackageFileWorkerCapacityCheckConstraintToAppSettings: migrated (0.0119s) 

== 20220114131950 AddStatusOnlyIndexToPackagesPackageFiles: migrating =========
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:packages_package_files, :status, {:name=>"index_packages_package_files_on_status", :algorithm=>:concurrently})
   -> 0.0065s
-- execute("SET statement_timeout TO 0")
   -> 0.0006s
-- add_index(:packages_package_files, :status, {:name=>"index_packages_package_files_on_status", :algorithm=>:concurrently})
   -> 0.0048s
-- execute("RESET statement_timeout")
   -> 0.0009s
== 20220114131950 AddStatusOnlyIndexToPackagesPackageFiles: migrated (0.0150s)

Migration down

== 20211224114539 AddPackagesCleanupPackageFileWorkerCapacityCheckConstraintToAppSettings: reverting 
-- transaction_open?()
   -> 0.0000s
-- transaction_open?()
   -> 0.0000s
-- execute("ALTER TABLE application_settings\nDROP CONSTRAINT IF EXISTS app_settings_p_cleanup_package_file_worker_capacity_positive\n")
   -> 0.0014s
== 20211224114539 AddPackagesCleanupPackageFileWorkerCapacityCheckConstraintToAppSettings: reverted (0.0089s) 
== 20211224112937 AddPackagesCleanupPackageFileWorkerCapacityToApplicationSettings: reverting 
-- remove_column(:application_settings, :packages_cleanup_package_file_worker_capacity, :smallint, {:default=>2, :null=>false})
   -> 0.0020s
== 20211224112937 AddPackagesCleanupPackageFileWorkerCapacityToApplicationSettings: reverted (0.0044s) 

== 20220114131950 AddStatusOnlyIndexToPackagesPackageFiles: reverting =========
-- transaction_open?()
   -> 0.0000s
-- indexes(:packages_package_files)
   -> 0.0051s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- remove_index(:packages_package_files, {:algorithm=>:concurrently, :name=>"index_packages_package_files_on_status"})
   -> 0.0032s
-- execute("RESET statement_timeout")
   -> 0.0005s
== 20220114131950 AddStatusOnlyIndexToPackagesPackageFiles: reverted (0.0112s) 

Explain plans

Similar to !76767 (merged), for the explain plans I've used a package from gitlab.com which has ~26K+ package files.

I've also run the following to set up the existing data before each query. This will update the status of the package files for my target package to a random value of 0 (default) or 1 (pending destruction).

exec UPDATE packages_package_files SET status = floor(random() * 2) WHERE package_id = XXX;
exec VACUUM ANALYZE packages_package_files;

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #345755 (closed)

Edited by David Fernandez

Merge request reports