Skip to content

Object Storage Direct Object Uploader

The Direct Object Uploader is a component that works with Extended CarrierWave to upload objects to Object Storage.

The bulk of the upload work is done by Workhorse, to avoid unicorn processes being locked up during the upload stage.

Workhorse will save incoming uploads directly to the object storage backend and then make an API call with a reference to the temporary file to the ruby backend.

All components which use the Extended CarrierWave should be able to use the Direct Object Uploader.

At present, only LFS Object Storage uses Extended CarrierWave, so this will be a good first component to move across to Direct Object Storage.

Before attachments, CI Artifacts/traces or other uploads can be moved to direct object storage using this component, they need to support extended carrierwave first.

API sequence diagrams

Here follow 2 diagrams explaining interactions during LFS and artifacts uploading.

Those diagrams do not have the same level of details, they can be used as starting points for implementation.

LFS must be a simple subset of Artifacts ​and not 2 different flows.

LFS

sequenceDiagram
    participant r as user
    participant w as gitlab-workhorse
    participant u as gitlab-unicorn
    participant os as Object Storage

    activate r
    r->>+w: git push with LFS object
    
    w->>+u: authorize
    Note over u,os: Presigning URLs for CarrierWave cache files
    u->>+os: pre-sign PutObject
    os-->>-u: presigned_url
    u->>+os: pre-sign RemoveObject
    os-->>-u: presigned_url
    u-->>-w: presigned_urls

    w->>+os: PutObject
    os-->>-w: result
 
    Note over w,os: Now we hijack the request body with object path and other metadata
    w->>+u: proxy to finalize_upload
    u->>+os: copy cache object to his final localtion
    os-->>-u: 

   u-->>-r: 
   deactivate r
   Note over w,os: now we can delete cache file
   w->>+os: RemoveObject
   os-->>-w: 
   deactivate w

Artifacts

This is the artifact upload, which is way more complex than LFS.

The diagram is not completely accurate, it's just to explain interactions and API calls

sequenceDiagram
    participant r as gitlab-runner
    participant w as gitlab-workhorse
    participant u as gitlab-unicorn
    participant s as sidekiq
    participant os as Object Storage

    r->>+w: upload_artifact
    
    Alt request has Content-Length || Object Storage is Google Cloud Storage
        w->>+u: authorize
    u->>+os: pre-sign PutObject
    os-->>-u: presigned_url
    u-->>-w: presigned_url

   Note over w,os: Only on GCS we can upload without Content-Length (we need to use chunked-encoding)
    w->>+os: PutObject
    os-->>-w: result
   else
    Loop every 10MB of file

   Note over w: Write parts to disk
    w->>+u: authorize
    u->>+os: pre-sign PutObject
    os-->>-u: presigned_url
    u-->>-w: presigned_url

    w->>+os: PutObject
    os-->>-w: result
   end
  end

    w->>+u: upload summary
    Opt more than one part
       u-Xs: merge_parts


     end

   u-->>-w: 
   w-->>-r: operation result

    Opt more than one part
    Note over s,os: Following API calls are not supported by GCS. In any case we will never upload different parts on GCS.
    activate s
    s->>+os: CreateMultipartUpload
    os-->>-s: UploadId
    Loop each part
         Note over s,os: No download needed, we are referring to already uploaded parts in bucket.
         s->>+os: UploadPartCopy
         os-->>-s: ETag
    end
    s->>+os: CompleteMultipartUpload
    Note over os: ObjectStorage perform the merge
    os-->>-s: 
    Loop each part
         s->>+os: RemoveObject
         os-->>-s: 
    end
    deactivate s
    end

Provider support

Provider chunked-encoding MultiPart/UploadPart - copy
Google CS yes no
AWS S3 no yes
minio no yes
ceph no it should, once ceph!20002 will be merged - for more details see ceph#22729
DO spaces no no. They claim to support MultiPart Upload, but in their API doc there's no UploadPart - Copy
Edited by Coung Ngo