Avoid copy operation in object store during Generic Packages upload
🔥 Problem
Similar to Avoid copying objects from one bucket to anothe... (#285597 - closed), the package registry can receive uploads for large files. The majority of the package registry uploads (most formats) will use a workhorse direct upload. In this mode, the file is put on object storage in a temporary location and when the upload is confirmed by the backend, the file is moved to its final location (using a copy operation).
The problem is that the GitLab instance can be connected to different object storage providers and that copy operation can take more or less time depending on the file size.
We need to avoid this copy operation at all. To do that, when the package file is uploaded to the Object Storage, we should instruct workhorse to consider the file's location as the final one. That means we won't need to move the file from its initial location. This initial location is its final location.
Implementation
In this MR, we are going to start with the Generic Package Repository. It's the most straightforward package format, and its debugging will be easier in case of any issues.
What does this MR do?
- Adding a new column named
file_final_path
to thepackages_package_files
table withtext
type and1024
char length. - Passing two new keyword arguments to
::Packages::PackageFileUploader.workhorse_authorize
method.use_final_store_path: true
final_store_path_root_id: <project_or_group_id>
- Gating the changes behind a feature flag for gradual rollout.
- Covering the changes with specs
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
How to set up and validate locally
-
Make sure Object Storage is enabled in your GDK.
-
Open
trace
tool in MinIO web interface and click onStart
button to trace the operations done in the Object Storage. -
From terminal, publish a dummy file (you can create a dummy
.txt
file) to the Generic Package Repository:curl --header "PRIVATE-TOKEN: <PAT>" --upload-file ./dummy.txt "http://gdk.test:3000/api/v4/projects/<project_id>/packages/generic/my_awesome_package/1.3.7/ananas.txt"
-
In MinIO trace window, you should see something similar to this screenshot:
📓 Notice the 3 operations:PutObject
,CopyObject
&DeleteObject
. -
In rails console, enable the
skip_copy_operation_in_generic_packages_upload
feature flag:::Feature.enable(:skip_copy_operation_in_generic_packages_upload)
-
Upload another dummy generic package while watching the trace window, you should see something similar to this screenshot: Now only one operation is done:
PutObject
🚀 -
You can test downloading the published package. Things should work normally.
Related to #429060 (closed)