Skip to content

WIP: Reimplement S3 parallelism

Richard Kennedy requested to merge richken/new_s3_parrallelism into master

Description

Previously, an MR was merged, which added support for parallelizing S3 requests. That MR ended up being reverted for a few reasons:

  1. There was no way to turn off the parallelism
  2. A large request could monopolize all the threads, starving other requests
  3. Potentially because of 2, there weren't significant improvement in performance

This new MR attempts to resolve the previous issues with the old implementation. This is done by using a BoundedThreadPool Executor instead of just a regular ThreadPool executor. When we run out of threads, submitting a request to the BoundedThreadPool executor will fail. Consequently, the S3Storage Class can fall back on serially submitting jobs when this happens. This has the benefit of not blocking requests, if all threads are currently in use. Moreover, there is additional support for preventing a single large job from monopolizing all threads in the Executor. Any single request can only use a specific number of threads at any point in time (specified by self._max_s3_executor_load).

Changes proposed in this merge request:

  • Reimplement S3 Parallelism

Additional work to be done in separate MR

  • Seperate ThreadPool Executors for reading/writing (To Do)
Edited by Marios Hadjimichael

Merge request reports