Skip to content

Changes how project export tarballs are uploaded to external website

Rodrigo Tomonari requested to merge rodrigo/31744-remote-upload-stream into master

What does this MR do and why?

When requesting a project to be exported using Import/Export, the user can request the project export tarball to be uploaded to an external website, and in the case, object store is enabled, the upload uses GitLab::HttpIO#read to stream read the file from object store. However, GitLab::HttpIO#read doesn't perform well with large files as it makes several HTTP requests that read small chunks of the file (See more detail about the problem in #31744 (comment 975715114))

This change introduces a different method to stream files from object store that only establishes one HTTP connection on the object store and stream the file from the underlying socket connection. The download file is read in chunks of 128KB that are uploaded to the external website, then another 128KB chunk is downloaded and uploaded. This repeats until all file is read and uploaded.

For now, the new method will be under a feature flag, so we can test for a while before making it the default method.

Comparison

Bellow some comparisons of how long it took to upload an export file using different methods on my local environment (150MBps connection)

Method / File size 15MB 400MB
GitLab::HttpIO ~70 seconds Took more than 10 minutes and failed with Gitlab::HttpIO::FailedToGetChunkError
Download to disk and upload from disk ~10 seconds ~80 seconds
Remote Stream - 8MB buffer ~10 seconds ~101 seconds
Remote Stream - 2MB buffer ~15 seconds ~85 seconds
Remote Stream - 1MB buffer ~11 seconds ~74 seconds
Remote Stream - 128KB buffer ~11 seconds ~75 seconds

I chose to use a buffer size of 128KB because I didn't notice an increase in upload speed with a larger buffer. In fact, using a large buffer made the upload a bit slower.

Related to: #31744 (closed)

Kudos to Kamil for giving the idea and sharing an example of the solution 🎉

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  1. Enable the new method
    Feature.enable(:import_export_web_upload_stream)
  2. Configure the local environment to use object store. GDK documentation to enable object store
  3. Request a project to be exported via API. In the request provide the upload website URL
curl --location --request POST 'http://gdk.test:3000/api/v4/projects/[ID]/export' \
--header 'PRIVATE-TOKEN: [TOKEN]' \
--header 'Content-Type: application/json' \
--data-raw '{
    "upload": {
        "URL": "[EXTERNAL_URL]",
        "method": "put"
    }
}'

A presigned URL for S3 can be generated using the snippet below:

#!/usr/bin/env ruby

require 'aws-sdk-s3'

bucket_name = 'bucket_example'
object_key = 'export.tar.gz'
access_key_id = 'access_key_id'
secret_access_key = 'secret_access_key'
expiration_time = 10_000

client = Aws::S3::Client.new(region: 'us-east-1', secret_access_key: secret_access_key, access_key_id: access_key_id)
bucket = Aws::S3::Bucket.new(bucket_name, client: client)
puts bucket.object(object_key).presigned_url(:put, expires_in: expiration_time)
  1. Wait for the project to be exported and uploaded to the external URL

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Rodrigo Tomonari

Merge request reports