When large files are uploaded to object storage by Workhorse, CarrierWave is responsible for copying these files from their temporary location to a final location. However, if the file is above 5 GB, the upload will fail outright because AWS requires multipart uploads to be used to copy files above that limit.
Even if multipart uploads were used, files containing several gigabytes of data would usually fail to complete within the 60-second Web request timeout. In one test, a 6 GB file took several minutes to copy with fog-aws, while it only took 36 seconds with the aws CLI. The main difference: multithreading.
fog-aws now supports multipart, multithreaded uploads per these pull requests:
For this to work, we also need to patch CarrierWave to use the
File#copy method instead of the Fog connection
method. We use a concurrency of 10 threads because this is what the AWS
SDK uses, and it appears to give good performance for large uploads.
This is wrapped around the
s3_multithreaded_uploads feature flag. We enable it by default because GitLab.com uses Google Compute Storage.
Relates to #216442 (closed)
Upstream pull request: https://github.com/carrierwaveuploader/carrierwave/pull/2526