Add SkipDelete option to the direct upload authorize response
🚀 Context
Workhorse direct uploads is an approach to deal with file uploads. In very short terms, the idea is to let workhorse upload the file on object storage while rails simply drives this process.
Here is a (simplified) interaction schema:
sequenceDiagram
autonumber
Client->>Workhorse: Here is a file to upload.
Workhorse->>Rails: Hello there! I have a file upload where do I put it?
Rails->>Workhorse: Sure, put it here <temporary location>.
Workhorse->>Object Storage: Put this file in <temporary location>.
Object Storage->>Workhorse: Done!
Workhorse->>Rails: Upload done! It's here <temporary location>.
Rails->>Rails: Process the uploaded file
Rails->>Object Storage: Copy file from <temporary location> to <final location>.
Object Storage->>Rails: Done!
Rails->>Object Storage: Delete file in <temporary location>.
Object Storage->>Rails: Done!
Rails->>Workhorse: Ok, all good.
Workhorse->>Object Storage: Delete file in <temporary location>.
Object Storage->>Workhorse: Done.
Workhorse->>Client: Ok, all good.
The important pieces are:
- Interaction (2.). Workhorse will ask rails where to put the file on object storage. Usually the endpoint in rails ends with
/authorize
. - Interaction (6.). Workhorse will confirm the upload to rails. Rails can then start processing the upload (eg. creating whatever rows are needed in the database.
- It's important to see that the file is first put in a
<temporary location>
. It's only when rails process the upload that the file is moved to the<final location>
. - Notice interactions (10.) and (13.). Those delete the
<temporary location>
. That's because no matter if rails successfully handles the upload or not, we want the<temporary location>
to be removed.- Interaction (13.) is a safety net. Workhorse will delete the
<temporary location>
, no matter what happens in rails with the upload.
- Interaction (13.) is a safety net. Workhorse will delete the
The problem with all this
🚒 Solution
The main idea is to avoid the <temporary location>
completely. Basically, instruct workhorse to put the file in the <final location>
directly. We would have these interations:
sequenceDiagram
autonumber
Client->>Workhorse: Here is a file to upload.
Workhorse->>Rails: Hello there! I have a file upload where do I put it?
Rails->>Workhorse: Sure, put it here <final location>.
Workhorse->>Object Storage: Put this file in <final location>.
Object Storage->>Workhorse: Done!
Workhorse->>Rails: Upload done! It's here <final location>.
Rails->>Rails: Process the uploaded file
Rails->>Workhorse: Ok, all good.
Workhorse->>Client: Ok, all good.
This way:
- We don't need to move the file within Object Storage. We completely avoid Avoid copying objects from one bucket to anothe... (#285597 - closed).
💪 - We don't need to have multiple delete requests for
<temporary location>
. - Overall, we have less interactions with Object Storage.
The above idea is being implemented for CI job artifacts in Draft: Skip copying artifacts uploaded to final... (!105074 - closed).
While reviewing that MR, we noticed that we need a workhorse change. We need a way to instruct workhorse to not delete the object targeted by the /authorize
response (interaction (2.)).
In other words, the "delete <temporary location>
" safety net in workhorse should be an optional one and rails should be able to decide if we use it or not.
This MR introduces such option.
🔬 What does this MR do and why?
- Add an additional field in the
/authorize
response:SkipDelete
. Default tofalse
. - When
SkipDelete=true
, workhorse will stop cleaning up the remote object that is targeted by the/authorize
response.
Note that this change is not used (yet) by any upload on the rails side. The default value is false
and this will follow the exact same behavior that we have currently.
Draft: Skip copying artifacts uploaded to final... (!105074 - closed) is responsible to flip the SkipDelete
flag so that the upload goes into the <final location>
directly.
📺 Screenshots or screen recordings
For the results below, we're going to use the Generic package registry. It allows us to do a workhorse direct upload very easily (one $ curl
command).
1️⃣ With SkipDelete
set to false
Workhorse object storage destination | Upload successful? | Can pull file after the upload? |
---|---|---|
Filesystem (object storage disabled) | ||
GoCloud | ||
S3 client | ||
S3 multipart | ||
Presigned put |
2️⃣ With SkipDelete
set to true
Workhorse object storage destination | Upload successful? | Can pull file after the upload? |
---|---|---|
Filesystem (object storage disabled) | ||
GoCloud | ||
S3 client | ||
S3 multipart | ||
Presigned put |
🔮 Conclusions
This change doesn't impact any of the object storage configuration!
This is expected as this change is not really connected to the codebase (for now).
⚗ How to set up and validate locally
Since the change is not connected to the codebase (yet), we will play around with the default value of SkipDelete
in the lib/object_storage/direct_upload.rb
file.
- Have GDK ready with object storage.
- Configure it in whatever way you want. (see above)
- Decide to
SkipDelete
or not inlib/object_storage/direct_upload.rb
. - Upload a package to the generic package registry:
$ curl --upload-file ./dummy.txt "http://<user>:<pat>@<gdk_host>/api/v4/projects/<project id>/packages/generic/package/1.5.1/dummy.txt"
- Try to pull back the file:
$ curl -L "http://<user>:<pat>@<gdk_host>/api/v4/projects/<project id>/packages/generic/package/1.5.1/dummy.txt"
🛃 MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.