Add support for S3 multipart upload with pycurl
Before raising this MR, consider whether the following are required, and complete if so:
-
Unit tests - [-] Metrics
- [-] Documentation update(s)
If not required, please explain in brief why not.
Description
This MR extends our pycurl-based S3 utilities to also support multipart upload. This feature allows the upload of objects >5GB in size, as well as providing a way to parallelise the upload of objects of any size. Previously we supported this thanks to boto3
making use of it automatically when uploading objects, but this was lost in the switch to using pycurl to send S3 requests.
The implementation mostly uses presigned URLs, except for sending the CompleteMultipartUpload
request. This final request uses the boto3
client directly, due to a bug in the URL presigning for this request when using s3v4
auth. This MR also updates our S3 client to use s3v4
auth since the demo S3-ish server we use in our docker-compose examples, MinIO, requires it for CreateMultipartUpload
requests.
This MR also includes changes to the internal CAS client, to support uploading large blobs to a BuildGrid with a Remote CAS backend.
Validation
Spin up a BuildGrid with an S3 backend, eg. our multi-level CAS example.
docker-compose -f docker-compose-examples/multi-level-cache.yml up
Create a large file and upload it with a CAS client.
head -c 6G /dev/urandom > big-input.txt
tox -e venv -- bgd cas upload-file big-input.txt