[Feature Request] Profile-guided Scalability Optimization

This feature request assumes that issue #5 (closed) (Show Us the Bottlenecks) has been resolved. Do that first.

By "scalability optimization," we mean optimization scaled across multiple CPUs and/or GPUs rather than constrained to a single CPU. The existing "Profile-guided Brute-force Optimization" feature request covers the latter.

May the TensorFlow Be with You

A wide array of open-source Pythonic frameworks for improving the scalability of pure-Python applications exist – including:

SymPy + TensorFlow. SymPy is a Python framework for expressing symbolic math. (Think Maple or Mathematica on open-source performance enhancing stimulants.) TenserFlow is a Google-funded initiative targeting both machine learning and superscalar supercomputing across multiple CPUs and GPUs. TensorFlow is implemented in highly optimized C++ (where absurd efficiency is demanded), CUDA (Nvidia's proprietary parallelization API for Nvidia-only GPUs), and Python (where readability and mantainability are paramount, meaning everywhere). BETSE wouldn't particularly benefit from the machine learning aspects of TensorFlow (yet), as we already leverage machine learning provided out-of-the-box by SciPy (e.g., Metropolis-Hastings-style basin hopping). BETSE would, however, absolutely benefit from the supercomputing aspects of TensorFlow, which provides a generalized improvement over older MapReduce-based approaches. TensorFlow is the prevailing "gold standard" in the Python community for scaling computational analytics up to the massively Corinthian heights of Big Data. BETSE isn't quite Big Data (yet), but we can still co-opt all of the same optimizations. Thanks, Google! TensorFlow would be our first-choice goto, ideally.
SymPy + Theano. Theano is a Python framework for distributing computations providing a similar (albeit more limited) alternative to TensorFlow... with some of the scalability and less of the machine learning. Theano is a decidedly awesome workhorse, but has arguably begun to show a bit of bit-rot around its edges. The glue factory is calling. Google explicitly developed TensorFlow to replace Theano, which it has since internally deprecated. Still, Theano remains a solid, blue-collar choice for exploring deep parallelization with Python.
And many, many similar frameworks – including Caffe, MXNet, NumExpr, and the innumerable list goes on and on.

May the TensorFlow not Be with You

All is not well in TensorFlow World, however. Even the great die young. Outstanding large-scale issues with TensorFlow include:

No explicit Windows support. Meanwhile, Theano explicitly supports Windows. While TensorFlow does technically appear to be installable under Windows 10 by installing Bash support, the process is highly non-trivial, error-prone, and (of course) Windows 10-specific. Hence, Windows remains unsupported in general.
No OpenCL support. OpenCL support appears to be forthcoming but is likely to prove non-trivial. Hence, TenseroFlow only supports CUDA-based Nvidia GPU parallelization. Since Theano also only supports CUDA-based Nvidia GPU parallelization, this is effectively ignorable for the time being. ^{It should be noted that the infamous BVLC Caffe fork transparently supports OpenCL. _Just sayin'._}
Not shipped by default with Anaconda – presumably due to the lack of Windows support, the principal use case for Anaconda. Since Theano is also not shipped by default with Anaconda, TensorFlow again gets a free pass here.

May the TensorFlow not Be with You

In either case, BETSE should probably only interface directly with SymPy as a parallelization frontend. Doing so substantially enhances extensibility, permitting us to "hot-plug" the optimal parallelization backend for the current platform into SymPy at runtime without hardcoding any specific parallelization logic into BETSE herself. This is critical, as installing these backends:

Is typically non-trivial, platform-specific, and failure-prone. Theano, for example, requires gcc to be concurrently installed. wat.
Should only ever be optional. Since scalability ultimately reduces to a non-essential optimization, scalability should never be enforced by requiring usage of specific backends. In the absence of either TensorFlow or Theano, SymPy gracefully falls back to any of several less efficient (albeit ubiquitous) approaches – including eval statements, Numpy-based lambda expressions, and so on.

SymPy, by compare, is trivially installable and shipped out-of-the-box with default Anaconda installations. Adding SymPy as a mandatory dependency of BETSE is a considerably more reasonable proposition than adding any specific parallelization backend as a mandatory dependency.

SymPy: Even You Are Non-ideal

SymPy and Numpy are mutually exclusive. You can't cleanly mix-and-match both in the same application. SymPy provide a SymPy-specific matrix class roughly analogous to Numpy's ndarray class. SymPy expressions must use this matrix class. Use of Numpy's ndarray class is confined to output code dynamically generated by SymPy. The two are bitter competitors – not friendly siblings, casual acquaintances, or even indifferent strangers.

What does this mean? Probably that SymPy is definitively out and absolutely everything I wrote above is complete rubbish. Since BETSE is self-obviously not dropping Numpy support, we are left with the execrable choice of choosing between one of TensorFlow, Theano, and/or Caffe. While the former appears to be the future, the gargantuan chore of installing any of these frameworks (particularly under bad-boy Windows) complicates end-user life and my growing migraine.

The Void Stares Back

Welcome to the infinite funhouse of computational mirrors. Where even the frameworks and APIs recurse into infinity.

Edited Sep 10, 2020 by Cecil Curry