Explore multithreading with OpenMP
The calculation of a cluster vector is a trivially parallelizable problem; the cluster vector elements are independent of each other. It should thus be possible to speed up the cluster vector calculation with multithreading using OpenMP. This MR implements that.
It works and gives a small speedup. Here are some results for a canonical MC run with different number of threads on my desktop (run-canon.py):
Master branch: 45.89 sec
1 thread: 47.47 sec
2 threads: 43.42 sec
4 threads: 35.53 sec
I set OMP_NUM_THREADS from the command line to run. I still haven't tested on any cluster where I could use more threads.
Is it worth it?
- In many cases, you wouldn't gain any speedup since you would run one MC run per core anyway. Yet, I imagine that multithreading would be quite useful for Wang-Landau simulations, which are not as trivially parallelizable as standard MC runs. Of course it is sometimes nice to get some additional speed by just using more cores in standard MC too.
- I don't know whether this increases the risk for unexpected bugs such that we would need specific tests?
- I'm not sure if compilation can become trickier on some systems?
It is worth noting that the Eigen library already uses OpenMP. The changes to CMakeLists.txt are roughly taken from there, but to be honest I don't really know what I'm doing.
Edited by Magnus Rahm