Skip to content

Avoid copy operation in object store during Generic Packages upload

🔥 Problem

Similar to Avoid copying objects from one bucket to anothe... (#285597 - closed), the package registry can receive uploads for large files. The majority of the package registry uploads (most formats) will use a workhorse direct upload. In this mode, the file is put on object storage in a temporary location and when the upload is confirmed by the backend, the file is moved to its final location (using a copy operation).

The problem is that the GitLab instance can be connected to different object storage providers and that copy operation can take more or less time depending on the file size.

We need to avoid this copy operation at all. To do that, when the package file is uploaded to the Object Storage, we should instruct workhorse to consider the file's location as the final one. That means we won't need to move the file from its initial location. This initial location is its final location.

Implementation

In this MR, we are going to start with the Generic Package Repository. It's the most straightforward package format, and its debugging will be easier in case of any issues.

What does this MR do?

  • Adding a new column named file_final_path to the packages_package_files table with text type and 1024 char length.
  • Passing two new keyword arguments to ::Packages::PackageFileUploader.workhorse_authorize method.
    • use_final_store_path: true
    • final_store_path_root_id: <project_or_group_id>
  • Gating the changes behind a feature flag for gradual rollout.
  • Covering the changes with specs

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

  1. Make sure Object Storage is enabled in your GDK.

  2. Open trace tool in MinIO web interface and click on Start button to trace the operations done in the Object Storage.

  3. From terminal, publish a dummy file (you can create a dummy .txt file) to the Generic Package Repository:

    curl --header "PRIVATE-TOKEN: <PAT>" --upload-file ./dummy.txt "http://gdk.test:3000/api/v4/projects/<project_id>/packages/generic/my_awesome_package/1.3.7/ananas.txt"
  4. In MinIO trace window, you should see something similar to this screenshot: Screenshot_2023-10-10_at_21.34.58 📓 Notice the 3 operations: PutObject, CopyObject & DeleteObject.

  5. In rails console, enable the skip_copy_operation_in_generic_packages_upload feature flag:

    ::Feature.enable(:skip_copy_operation_in_generic_packages_upload)
  6. Upload another dummy generic package while watching the trace window, you should see something similar to this screenshot: Screenshot_2023-10-10_at_21.35.57 Now only one operation is done: PutObject 🚀

  7. You can test downloading the published package. Things should work normally.

Related to #429060 (closed)

Edited by Moaz Khalifa

Merge request reports