Skip to content

Add SkipDelete option to the direct upload authorize response

🚀 Context

Workhorse direct uploads is an approach to deal with file uploads. In very short terms, the idea is to let workhorse upload the file on object storage while rails simply drives this process.

Here is a (simplified) interaction schema:

sequenceDiagram
  autonumber
  Client->>Workhorse: Here is a file to upload.
  Workhorse->>Rails: Hello there! I have a file upload where do I put it?
  Rails->>Workhorse: Sure, put it here <temporary location>.
  Workhorse->>Object Storage: Put this file in <temporary location>.
  Object Storage->>Workhorse: Done! 
  Workhorse->>Rails: Upload done! It's here <temporary location>.
  Rails->>Rails: Process the uploaded file
  Rails->>Object Storage: Copy file from <temporary location> to <final location>.
  Object Storage->>Rails: Done! 
  Rails->>Object Storage: Delete file in <temporary location>.
  Object Storage->>Rails: Done! 
  Rails->>Workhorse: Ok, all good.
  Workhorse->>Object Storage: Delete file in <temporary location>.
  Object Storage->>Workhorse: Done.  
  Workhorse->>Client: Ok, all good. 

The important pieces are:

  • Interaction (2.). Workhorse will ask rails where to put the file on object storage. Usually the endpoint in rails ends with /authorize.
  • Interaction (6.). Workhorse will confirm the upload to rails. Rails can then start processing the upload (eg. creating whatever rows are needed in the database.
  • It's important to see that the file is first put in a <temporary location>. It's only when rails process the upload that the file is moved to the <final location>.
  • Notice interactions (10.) and (13.). Those delete the <temporary location>. That's because no matter if rails successfully handles the upload or not, we want the <temporary location> to be removed.
    • Interaction (13.) is a safety net. Workhorse will delete the <temporary location>, no matter what happens in rails with the upload.

The problem with all this 🤹 on object storage is that, it can bring some pains with large files. Avoid copying objects from one bucket to anothe... (#285597 - closed) is an example of that.

🚒 Solution

The main idea is to avoid the <temporary location> completely. Basically, instruct workhorse to put the file in the <final location> directly. We would have these interations:

sequenceDiagram
  autonumber
  Client->>Workhorse: Here is a file to upload.
  Workhorse->>Rails: Hello there! I have a file upload where do I put it?
  Rails->>Workhorse: Sure, put it here <final location>.
  Workhorse->>Object Storage: Put this file in <final location>.
  Object Storage->>Workhorse: Done! 
  Workhorse->>Rails: Upload done! It's here <final location>.
  Rails->>Rails: Process the uploaded file
  Rails->>Workhorse: Ok, all good.
  Workhorse->>Client: Ok, all good. 

This way:

The above idea is being implemented for CI job artifacts in Draft: Skip copying artifacts uploaded to final... (!105074 - closed).

While reviewing that MR, we noticed that we need a workhorse change. We need a way to instruct workhorse to not delete the object targeted by the /authorize response (interaction (2.)).

In other words, the "delete <temporary location>" safety net in workhorse should be an optional one and rails should be able to decide if we use it or not.

This MR introduces such option.

🔬 What does this MR do and why?

  • Add an additional field in the /authorize response: SkipDelete. Default to false.
  • When SkipDelete=true, workhorse will stop cleaning up the remote object that is targeted by the /authorize response.

Note that this change is not used (yet) by any upload on the rails side. The default value is false and this will follow the exact same behavior that we have currently.

Draft: Skip copying artifacts uploaded to final... (!105074 - closed) is responsible to flip the SkipDelete flag so that the upload goes into the <final location> directly.

📺 Screenshots or screen recordings

For the results below, we're going to use the Generic package registry. It allows us to do a workhorse direct upload very easily (one $ curl command).

1️⃣ With SkipDelete set to false

Workhorse object storage destination Upload successful? Can pull file after the upload?
Filesystem (object storage disabled)
GoCloud
S3 client
S3 multipart
Presigned put

2️⃣ With SkipDelete set to true

Workhorse object storage destination Upload successful? Can pull file after the upload?
Filesystem (object storage disabled)
GoCloud
S3 client
S3 multipart
Presigned put

🔮 Conclusions

This change doesn't impact any of the object storage configuration! 🎉

This is expected as this change is not really connected to the codebase (for now).

How to set up and validate locally

Since the change is not connected to the codebase (yet), we will play around with the default value of SkipDelete in the lib/object_storage/direct_upload.rb file.

  1. Have GDK ready with object storage.
    • Configure it in whatever way you want. (see above)
  2. Decide to SkipDelete or not in lib/object_storage/direct_upload.rb.
  3. Upload a package to the generic package registry:
    $ curl --upload-file ./dummy.txt "http://<user>:<pat>@<gdk_host>/api/v4/projects/<project id>/packages/generic/package/1.5.1/dummy.txt"
  4. Try to pull back the file:
    $ curl -L "http://<user>:<pat>@<gdk_host>/api/v4/projects/<project id>/packages/generic/package/1.5.1/dummy.txt"  

🛃 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports