Spike: How to address the feature request to support AWS S3 multipart uploads (cache-archiver)?
Scope
-
Define a viable solution to address the feature request to support AWS S3 multipart uploads (cache-archiver)
Problems / issues to consider
-
in container-based environments, we can't expect the user to have the right tools installed in their image.
-
gitlab-runner
and the job might run in two completely different environments, for example, different VMs. In that scenario, we can't get access to the cache on disk without running remote commands to that machine by passing credentials. -
We need to have the
cache-uploader
command because we need to have a way to tell the remote environment, "hey upload the cache", and we can't do that without our own command hosted bundled inside the helper image that has the same job dir mounted to it so it can upload it. -
Today in the runner, we upload the cache as one big blob to a
pre-signed
upload URL. However, using apre-signed
upload URL means that there is no sharing of credentials with the job environment. The use of apre-signed url
means we don't need to share S3 credentials with the job environment, but it will be definitely less efficient than what AWS CLI will do when used directly. How much can we improve that without requiring us to push the cache object storage credentials to the job environment?