Skip to content

Add support for S3 multipart upload with pycurl

Adam Coldrick requested to merge sotk/cas/multipart-upload into master

Before raising this MR, consider whether the following are required, and complete if so:

  • Unit tests
  • [-] Metrics
  • [-] Documentation update(s)

If not required, please explain in brief why not.

Description

This MR extends our pycurl-based S3 utilities to also support multipart upload. This feature allows the upload of objects >5GB in size, as well as providing a way to parallelise the upload of objects of any size. Previously we supported this thanks to boto3 making use of it automatically when uploading objects, but this was lost in the switch to using pycurl to send S3 requests.

The implementation mostly uses presigned URLs, except for sending the CompleteMultipartUpload request. This final request uses the boto3 client directly, due to a bug in the URL presigning for this request when using s3v4 auth. This MR also updates our S3 client to use s3v4 auth since the demo S3-ish server we use in our docker-compose examples, MinIO, requires it for CreateMultipartUpload requests.

This MR also includes changes to the internal CAS client, to support uploading large blobs to a BuildGrid with a Remote CAS backend.

Validation

Spin up a BuildGrid with an S3 backend, eg. our multi-level CAS example.

docker-compose -f docker-compose-examples/multi-level-cache.yml up

Create a large file and upload it with a CAS client.

head -c 6G /dev/urandom > big-input.txt
tox -e venv -- bgd cas upload-file big-input.txt

Merge request reports