Add package file status attribute and introduce the installable scope
🤹 Context
With time, users of the package registry fill it with many packages and package files.
These objects count towards object storage usage. Users have this need of managing this storage efficiently. For this, we're coming up with cleanup policies for packages.
A cleanup policy is basically a set of parameters that a recurrent job will read. Those parameters will target packages / package files and remove them from the package registry. As an example of those parameters, it could be: for a given package name, keep the 10 most recent versions and remove everything else.
♻ Removing package files
Removing package files is not a trivial operation. Each package file is linked to a physical file on object storage.
This means that we cannot simply trigger a single DELETE
statement to the database and that's it. We need to instantiate the ActiveRecord object and call #destroy!
(or any of its cousins) on it. This way, carrierwave
callbacks kick in and will issue a DELETE
request to object storage.
Because cleanup policies will handle this case of deleting potentially massive amounts of package files, we decided to use a "soft" delete. By soft delete, we mean that the full destruction of a package file happens in two steps:
- The package file is "marked" as destroyed.
- This will remove the package file from being available in UI/APIs
- No interactions with Object Storage will happen with this action.
- The package file is actually destroyed.
- This is handled by a second background worker that will take the list of "marked" package files as its backlog.
The above plan is described in issue #345755 (closed).
This MR represents the very first step of the above plan:
- add a
status
column in the package files table - Make sure that UI and APIs can't fetch non
default
package files.
🔬 What does this MR do and why?
- Add a
status
column to thepackages_package_files
table- That
status
is mapped on an Rails enum. - Right now, we only have two values:
default
andpending_destruction
- The Rails enum is better than simply a boolean column because it gives us more flexibility for future evolutions. Imagine that package files can be
imported
and should be marked as such.
- That
- Add a scope
.installable
that will select onlydefault
package files for now.- That scope is similar to what we have for packages.
- Update all the finders/services to use the introduced scope.
- Update all the related specs.
- Because theses changes are quite broad (all package formats are impacted) and can introduce ~performance issues (for example,
preloads
that will not work anymore =n+1
queries💥 ), all the changes have been gated behind a feature flag:packages_installable_package_files
.- This provides an additional safety net.
- Rollout issue: #348677 (closed)
📷 Screenshots or screen recordings
I'm going to show an usage example with the generic packages registry but those changes are for all package formats.
- Upload a new generic package file
$ cat dummy.txt bananas $ curl --header "PRIVATE-TOKEN: <PAT>" --upload-file ./dummy.txt "http://gdk.test:8000/api/v4/projects/509/packages/generic/my_package/0.0.1/file.txt" {"message":"201 Created"}
- Pull it
$ curl --header "PRIVATE-TOKEN: <PAT>" "http://gdk.test:8000/api/v4/projects/509/packages/generic/my_package/0.0.1/file.txt" bananas
- Mark it as
pending_destruction
(in a rails console)Packages::PackageFile.last.pending_destruction!
- Pull it
$ curl --header "PRIVATE-TOKEN: <PAT>" "http://gdk.test:8000/api/v4/projects/509/packages/generic/my_package/0.0.1/file.txt" {"message":"404 Not found"}
- Not found because
pending_destruction
package files are not available in APIs.
- Not found because
- To complete this demo, let's disable the feature flag (in a rails console)
Feature.disable(:packages_installable_package_files)
- Let's pull the package file
$ curl --header "PRIVATE-TOKEN: <PAT>" "http://gdk.test:8000/api/v4/projects/509/packages/generic/my_package/0.0.1/file.txt" bananas
- Pending destruction package file found and returned because the feature flag is disabled.
⚗ How to set up and validate locally
- Setup the package registry (any format)
- Create and upload a new package
- Enable the feature flag:
Feature.enable(:packages_installable_package_files)
- Mark the package file as
pending_destruction
:Packages::PackageFile.last.pending_destruction!
- Try to pull the package file
🛃 MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
💿 Database review
⬆ Migration up
== 20211213154259 AddStatusToPackagesPackageFiles: migrating ==================
-- add_column(:packages_package_files, :status, :smallint, {:default=>0, :null=>false})
-> 0.0084s
== 20211213154259 AddStatusToPackagesPackageFiles: migrated (0.0085s) =========
== 20211213154704 AddStatusIndexToPackagesPackageFiles: migrating =============
-- transaction_open?()
-> 0.0000s
-- index_exists?(:packages_package_files, [:package_id, :status, :id], {:name=>"index_packages_package_files_on_package_id_status_and_id", :algorithm=>:concurrently})
-> 0.0090s
-- execute("SET statement_timeout TO 0")
-> 0.0012s
-- add_index(:packages_package_files, [:package_id, :status, :id], {:name=>"index_packages_package_files_on_package_id_status_and_id", :algorithm=>:concurrently})
-> 0.0097s
-- execute("RESET statement_timeout")
-> 0.0012s
== 20211213154704 AddStatusIndexToPackagesPackageFiles: migrated (0.0233s) ====
⬇ Migration down
== 20211213154704 AddStatusIndexToPackagesPackageFiles: reverting =============
-- transaction_open?()
-> 0.0000s
-- indexes(:packages_package_files)
-> 0.0069s
-- execute("SET statement_timeout TO 0")
-> 0.0013s
-- remove_index(:packages_package_files, {:algorithm=>:concurrently, :name=>"index_packages_package_files_on_package_id_status_and_id"})
-> 0.0039s
-- execute("RESET statement_timeout")
-> 0.0011s
== 20211213154704 AddStatusIndexToPackagesPackageFiles: reverted (0.0154s) ====
== 20211213154259 AddStatusToPackagesPackageFiles: reverting ==================
-- remove_column(:packages_package_files, :status, :smallint, {:default=>0, :null=>false})
-> 0.0056s
== 20211213154259 AddStatusToPackagesPackageFiles: reverted (0.0109s) =========
📊 Explain plans
We have quite a few explain plans to present here as package files are core elements of the package registry.
For almost all the explain plans, I used the same package from gitlab.com which has ~25K+ package files. Even though, the package type is not the right for all the explain plans, this will not change the access for package files. Usually, we access them through the package_id
.
In each explain plan, I included setup
instructions for postgres.ai. In this setup, you will often see this line:
EXEC UPDATE packages_package_files SET status = floor(random() * 2) WHERE package_id = XXX;
Basically, I take my target package and I update the status of the package files randomly. This way, we don't end up with all package files in default
or pending_destruction
status. It's a mix of them.
app/controllers/projects/packages/infrastructure_registry_controller.rb
app/finders/packages/package_file_finder.rb
app/graphql/types/packages/package_details_type.rb
app/models/concerns/packages/debian/distribution.rb
app/models/packages/package.rb
app/models/packages/package_file.rb
scopefor_helm_with_channel
app/models/packages/package_file.rb
scopemost_recent_for
app/presenters/packages/conan/package_presenter.rb
app/presenters/packages/detail/package_presenter.rb
app/presenters/packages/npm/package_presenter.rb
app/presenters/packages/nuget/presenter_helpers.rb
app/presenters/packages/pypi/package_presenter.rb
app/services/packages/maven/metadata/sync_service.rb
lib/api/terraform/modules/v1/packages.rb
lib/api/package_files.rb
listlib/api/package_files.rb
deletelib/api/rubygem_packages.rb