Use workhorse to perform object storage uploads asking unicorn only to authorize and finalize the upload.
Use the LFS protocol to return a signed upload URL to the client which LFS will used to upload direct to S3 compatible storage.
Why
We still want to have full control over the upload process, so we can at least save unicorn resources and upload to object storage while workhorse receives the object.
The way this currently works, we need NFS to handle an LFS upload:
workhorse checks with unicorn if the incoming LFS upload is allowed
workhorse accepts the LFS upload and hands off the file to Unicorn (same machine)
Unicorn hands off to Sidekiq (any machine)
Sidekiq uploads the LFS file to S3
It would perfectly feasible to do this instead:
workhorse checks with unicorn if the upload is allowed
unicorn hands back an S3 upload link to workhorse
workhorse accepts the file and simultaneously it to S3
workhorse notifies Unicorn that the file has been uploaded
This would be synchronous; when the LFS client gets confirmation that its upload has been accepted, it is already in S3.
But the big reason to do this is that you don't need NFS to store the file in between accepting it and upload to S3.
Edit: or we do direct upload from the LFS client to S3 of course. I'm not sure how we verify the SHA256 of the upload then. But either way I think it'd be good to get rid of NFS as an intermediate.
We can extend LFS code to make a direct upload to Object Storage.
Then, we don't have problems described here, any longer, as we: can migrate easily all existing data, support local and object storage concurrently at the will.
The Git LFS is already prepared for multi-staged uploads, we would have to:
On authorize return information where we should upload creating preauthorized POST requests to Object Storage (already achievable),
Make Workhorse aware that it might not receive file, but instead receive information that file was uploaded,
Make Rails to decode the temporary file, and copy object on Object Storage to final path expected by CarrierWave.
Complexity
Moderate. It requires mostly Rails knowledge. All the rest of quirks how to generate preauthorized POSTs, and copy objects is already figured out (in one of my branches). The current LFS/Artifacts Object Storage code supports this way of operation.
The little work is needed to change authorize scheme to include more code, and make lfs code to be aware of using remotely stored uploads.
If the file goes straight to Object Storage, Workhorse would no longer be able to verify that the file SHA256 and OID match, and we would be creating LfsObject records (and giving users access to existing LFS objects) purely on faith that the client is not lying to us.
Another possibility, we could also continue to have the client talk to Workhorse and have Workhorse stream directly to object storage (using https://github.com/minio/minio-go or any number of other S3 client libs).
This way, we can verify checksums and tell the GitLab API about the object location once verification and upload is complete.
Pros
Server can verify LFS objects without trusting the client
Client remains the same (still talks to workhorse)
Good point. It might be also possible to calculate the checksum before moving the file to a different S3 location, but I guess this might incur bandwidth costs anyway and be more complicated.
/cc @DouweM we'll need to work out to solve this for GCP Migration and ~"Cloud Native". This is one of a number of points in GitLab were we need to very quickly add support for directly writing to object storage to unblock these projects.
Just in case: I had some working Workhorse code that did simultaneous upload to external URL via POST that was received via /authorize. So, Rails could control where it would be uploaded :)
@jramsay Yep, I'm aware. I think the most interesting work for this feature would need to happen in gitlab-workhorse though, and ~Platform is severely lacking in Go expertise. In the past, the Workhorse side of LFS has been tackled by @jacobvosmaer-gitlab and @ayufan, but I don't know if they are available right now.