[Feature Request] Multithreaded Numpy Detection for Eternal Math Glory
This feature request does not assume that issues #4 (closed) (General-purpose Profiling) and #5 (closed) (Show Us the Bottlenecks) have been resolved. Why? Because we already definitively know that Numpy's dot product is a bottleneck for BETSE. And bottlenecks chap my hide.
- Accelerated (i.e., multithreaded), good things happen. Numpy and hence BETSE itself will be accelerated, implying that BETSE is already parallelized in a portion of its critical path across multiple cores and hyperthreads of the same machine.
- Unaccelerated (i.e., non-multithreaded), bad things happen. Numpy and hence BETSE itself will be unaccelerated.
Which of these two scenarios is more likely for most end users? Naturally, the latter. Numpy is usually linked against unaccelerated BLAS and LAPACK libraries, by default. Linking Numpy against accelerated BLAS and LAPACK libraries is non-trivial and, arguably, exceeds the limited scope of BETSE. Because the worst of all possible worlds is unfolding here and now.
Can BETSE Do Anything or Should We Just Give Up?
We should just give up.
Only kidding! Of course BETSE can do something, silly. BETSE can (in order):
- Conditionally query Numpy at runtime for whether Numpy has been linked against an accelerated or unaccelerated BLAS library. LAPACK acceleration is significantly less critical and possibly even ignorable for our purposes. For:
Numpy < 1.11.0, doing so presumably relates to the
numpy.restoredot()functions. To quote the official documentation for the former: "If Numpy is built with an accelerated BLAS, the
numpy.alterdot()function is automatically called when Numpy is imported. When Numpy is built with an accelerated BLAS like ATLAS, these functions are replaced to make use of the faster implementations."
Numpy >= 1.11.0, we are ignorance incarnate. The
numpy.restoredot()functions have been reduced to noops in newer versions of Numpy. Similar functionality must reside elsewhere in the Numpy codebase, however. The
numpy.__config__API appears to provide the requisite metadata, albeit in an inconvenient format. A more reliable methodology not requiring string matching would be preferred. Investigate the deepest black magic!
- If Numpy reports that it has been linked against an unaccelerated BLAS library, log a non-fatal warning from BETSE.
Accelerated BLAS Libraries: Does the Mythical Unicorn Actually Exist?
Yup! It does and it's awesome. Accelerated BLAS libraries include (in descending order of efficiency):
- Intel® Math Kernel Library (MKL), possibly regarded as the fastest and most stable of the accelerated BLAS libraries. As the ® implies, we do not like it. Closed-source is bad source. Hence, no link. So say we all.
- BLIS, commonly regarded as the fastest but least stable of the accelerated open-source BLAS libraries. Perfect for those with an appetite for high-risk drunken B.A.S.E. jumping.
- OpenBLAS, commonly regarded as the second-fastest but second-least stable of the accelerated open-source BLAS libraries. Mediocre jack-of-all-trades or merely sublime? You decide.
- ATLAS, commonly regarded as the slowest but stablest of the accelerated open-source BLAS libraries. This is not a bad thing.
- GotoBLAS2, obsoleted by OpenBLAS. Next.
It should be noted that Anaconda >= 2.5 ships MKL-optimized packages by default. This includes Numpy. Happiness ensues. Our work may already be done for the overwhelming majority of end users. Except me. Because Gentoo.
In summation, welcome to the machine.
@dglmoore Hey, Dougster. Are you still spawning a googolplex of concurrent BETSE simulations? If so, you're probably at the epicentre of performance woes with BETSE – implying this issue might be of mutual interest. Let's ride the efficiency dromedary to success together!