Skip to content

Fix large S3 uploads failing to finalize

Stan Hu requested to merge sh-accelerate-s3-copy into master

When large files are uploaded to object storage by Workhorse, CarrierWave is responsible for copying these files from their temporary location to a final location. However, if the file is above 5 GB, the upload will fail outright because AWS requires multipart uploads to be used to copy files above that limit.

Even if multipart uploads were used, files containing several gigabytes of data would usually fail to complete within the 60-second Web request timeout. In one test, a 6 GB file took several minutes to copy with fog-aws, while it only took 36 seconds with the aws CLI. The main difference: multithreading.

fog-aws now supports multipart, multithreaded uploads per these pull requests:

For this to work, we also need to patch CarrierWave to use the File#copy method instead of the Fog connection copy_object method. We use a concurrency of 10 threads because this is what the AWS SDK uses, and it appears to give good performance for large uploads.

This is wrapped around the s3_multithreaded_uploads feature flag. We enable it by default because GitLab.com uses Google Compute Storage.

Relates to #216442 (closed)

Upstream pull request: https://github.com/carrierwaveuploader/carrierwave/pull/2526

Edited by Stan Hu

Merge request reports