Support multi-threaded download/upload cache and artifacts
Description
Currently we use minio as a storage object and everything works fine. But because the project itself is nodejs. Causes the size of the cache to exceed 800M.
I looked at the source code and found that when downloading, I used a single-threaded download.
https://gitlab.com/gitlab-org/gitlab-runner/blob/master/commands/helpers/cache_extractor.go#L92-104
https://gitlab.com/gitlab-org/gitlab-runner/blob/master/commands/helpers/cache_extractor.go#L70
Whether it is s3 or mimio, or other storage methods, theoretically support multi-threaded download. The current single-threaded download will cause the time to become very long when the cache is downloaded.
Of course, if you use multi-threading when uploading, it would be better (if you consider support)
Discussions to date on the feasibility of implementing this proposal (revised 2022-01-11)
-
This proposal would certainly improve performance, but there's technical reasons for why it cannot be done at the moment. This was recently discussed, but we haven't decided on a way forward yet.
-
Multipart uploads don't work particularly well for pre-signed requests, which is what we have now, because GitLab-Runner pre-generates the upload URL and provides it to the job. The reason for this is because we don't want to reveal the authentication token to the job because it would expose the entire bucket to it.
-
Multipart uploads require more communication with the cloud storage provider than just this simple pre-signed request and requires the authentication token.
The solution we want for this is something that works across all executors, but becomes quite complex:
-
We could proxy all requests to a service that performs the authentication, without revealing the authentication token to the job.
-
We could allow jobs to communicate with Runner and pre-signed each multipart upload chunk URL. But GitLab Jobs don't currently communicate to Runner like this, so it would be a very large change.
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.