Skip to content

Add package file status attribute and introduce the installable scope

David Fernandez requested to merge 345755-package-files-pending-destruction into master

🎉 MR 76767 🎉 Woohoo! It's mine! Mine! (sorry about that)

🤹 Context

With time, users of the package registry fill it with many packages and package files.

These objects count towards object storage usage. Users have this need of managing this storage efficiently. For this, we're coming up with cleanup policies for packages.

A cleanup policy is basically a set of parameters that a recurrent job will read. Those parameters will target packages / package files and remove them from the package registry. As an example of those parameters, it could be: for a given package name, keep the 10 most recent versions and remove everything else.

Removing package files

Removing package files is not a trivial operation. Each package file is linked to a physical file on object storage.

This means that we cannot simply trigger a single DELETE statement to the database and that's it. We need to instantiate the ActiveRecord object and call #destroy! (or any of its cousins) on it. This way, carrierwave callbacks kick in and will issue a DELETE request to object storage.

Because cleanup policies will handle this case of deleting potentially massive amounts of package files, we decided to use a "soft" delete. By soft delete, we mean that the full destruction of a package file happens in two steps:

  1. The package file is "marked" as destroyed.
    • This will remove the package file from being available in UI/APIs
    • No interactions with Object Storage will happen with this action.
  2. The package file is actually destroyed.
    • This is handled by a second background worker that will take the list of "marked" package files as its backlog.

The above plan is described in issue #345755 (closed).

This MR represents the very first step of the above plan:

  • add a status column in the package files table
  • Make sure that UI and APIs can't fetch non default package files.

🔬 What does this MR do and why?

  • Add a status column to the packages_package_files table
    • That status is mapped on an Rails enum.
    • Right now, we only have two values: default and pending_destruction
    • The Rails enum is better than simply a boolean column because it gives us more flexibility for future evolutions. Imagine that package files can be imported and should be marked as such.
  • Add a scope .installable that will select only default package files for now.
    • That scope is similar to what we have for packages.
  • Update all the finders/services to use the introduced scope.
  • Update all the related specs.
  • Because theses changes are quite broad (all package formats are impacted) and can introduce ~performance issues (for example, preloads that will not work anymore = n+1 queries 💥), all the changes have been gated behind a feature flag: packages_installable_package_files.

📷 Screenshots or screen recordings

I'm going to show an usage example with the generic packages registry but those changes are for all package formats.

  1. Upload a new generic package file
    $ cat dummy.txt 
    bananas
    $ curl --header "PRIVATE-TOKEN: <PAT>" --upload-file ./dummy.txt "http://gdk.test:8000/api/v4/projects/509/packages/generic/my_package/0.0.1/file.txt"
    {"message":"201 Created"}
  2. Pull it
    $ curl --header "PRIVATE-TOKEN: <PAT>" "http://gdk.test:8000/api/v4/projects/509/packages/generic/my_package/0.0.1/file.txt"
    bananas
  3. Mark it as pending_destruction (in a rails console)
    Packages::PackageFile.last.pending_destruction!
  4. Pull it
    $ curl --header "PRIVATE-TOKEN: <PAT>" "http://gdk.test:8000/api/v4/projects/509/packages/generic/my_package/0.0.1/file.txt"
    {"message":"404 Not found"}
    • Not found because pending_destruction package files are not available in APIs.
  5. To complete this demo, let's disable the feature flag (in a rails console)
    Feature.disable(:packages_installable_package_files)
  6. Let's pull the package file
    $ curl --header "PRIVATE-TOKEN: <PAT>" "http://gdk.test:8000/api/v4/projects/509/packages/generic/my_package/0.0.1/file.txt"
    bananas
    • Pending destruction package file found and returned because the feature flag is disabled.

How to set up and validate locally

  1. Setup the package registry (any format)
  2. Create and upload a new package
  3. Enable the feature flag: Feature.enable(:packages_installable_package_files)
  4. Mark the package file as pending_destruction: Packages::PackageFile.last.pending_destruction!
  5. Try to pull the package file

🛃 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

💿 Database review

Migration up

== 20211213154259 AddStatusToPackagesPackageFiles: migrating ==================
-- add_column(:packages_package_files, :status, :smallint, {:default=>0, :null=>false})
   -> 0.0084s
== 20211213154259 AddStatusToPackagesPackageFiles: migrated (0.0085s) =========

== 20211213154704 AddStatusIndexToPackagesPackageFiles: migrating =============
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:packages_package_files, [:package_id, :status, :id], {:name=>"index_packages_package_files_on_package_id_status_and_id", :algorithm=>:concurrently})
   -> 0.0090s
-- execute("SET statement_timeout TO 0")
   -> 0.0012s
-- add_index(:packages_package_files, [:package_id, :status, :id], {:name=>"index_packages_package_files_on_package_id_status_and_id", :algorithm=>:concurrently})
   -> 0.0097s
-- execute("RESET statement_timeout")
   -> 0.0012s
== 20211213154704 AddStatusIndexToPackagesPackageFiles: migrated (0.0233s) ====

Migration down

== 20211213154704 AddStatusIndexToPackagesPackageFiles: reverting =============
-- transaction_open?()
   -> 0.0000s
-- indexes(:packages_package_files)
   -> 0.0069s
-- execute("SET statement_timeout TO 0")
   -> 0.0013s
-- remove_index(:packages_package_files, {:algorithm=>:concurrently, :name=>"index_packages_package_files_on_package_id_status_and_id"})
   -> 0.0039s
-- execute("RESET statement_timeout")
   -> 0.0011s
== 20211213154704 AddStatusIndexToPackagesPackageFiles: reverted (0.0154s) ====

== 20211213154259 AddStatusToPackagesPackageFiles: reverting ==================
-- remove_column(:packages_package_files, :status, :smallint, {:default=>0, :null=>false})
   -> 0.0056s
== 20211213154259 AddStatusToPackagesPackageFiles: reverted (0.0109s) =========

📊 Explain plans

We have quite a few explain plans to present here as package files are core elements of the package registry.

For almost all the explain plans, I used the same package from gitlab.com which has ~25K+ package files. Even though, the package type is not the right for all the explain plans, this will not change the access for package files. Usually, we access them through the package_id.

In each explain plan, I included setup instructions for postgres.ai. In this setup, you will often see this line:

EXEC UPDATE packages_package_files SET status = floor(random() * 2) WHERE package_id = XXX;

Basically, I take my target package and I update the status of the package files randomly. This way, we don't end up with all package files in default or pending_destruction status. It's a mix of them.

Edited by Hugo Ortiz

Merge request reports