example 07 simulation far slower than example 04: Numpy threading?
Example 07 runs seven times slower than example 04 besides being only about twice as large (see timings below). The process time is also far larger than the wall time, hinting at multithreading occurring somewhere (see also !28). However, FONSim does not explicitly use threading anywhere, so it must occur in one of the imported modules.
Could it be Numpy? Apparently Numpy may do multithreading without explicitly being asked. This behaviour can be disabled by putting before the Numpy import the following four lines, which set three environment variables:
import os
os.environ["MKL_NUM_THREADS"] = "1"
os.environ["NUMEXPR_NUM_THREADS"] = "1"
os.environ["OMP_NUM_THREADS"] = "1"
Doing so makes the wall and process time match again. It's also more than twice as fast!
Timings, pairs of wall and process time (simulation timestep = 1e-3 s):
threading | 04 | 07 |
---|---|---|
default | 1.1, 1.1 | 7.6, 60.1 |
disabled | 1.1, 1.1 | 3.2, 3.2 |
This raises an important question:
Why does Numpy choose to multithread if doing so slows down the calculations?
Are we perhaps using Numpy incorrectly?
Is this a bug an undesired feature in Numpy?
Doing a few internet searchers appears to point to the latter...