Update/refactor for performance
This refactor is designed to maximize use of threading at all times to speed up performance. As a demonstration of the performance gain, here are some stats that were compiled using the refactor versus previous versions:
Note these were executed while downloading the following packages: mxnet-cu80 lalsuite frida tensorflow-io-nightly tensorflow-gpu cupy-cuda91 cupy-cuda90
- Using current version with parallel utility with a thread count of 1: 462m47
- Using current version with a thread count of 15: 401m4
- Using this refactor with a thread count of 15: 326m7