Change default for StatesBlockSize
Description
Up to now, the default of StatesBlockSize for CPU runs depended on the number of OpenMP threads. However, the performance of the code is quite bad for StatesBlockSize > 8 because it leads to a much higher pressure on the memory bandwidth and thus less cache reusage.
Testing different hybrid combinations revealed a much better scaling of the OpenMP parallelization if this variable is kept fixed instead of depending on the number of threads.
News snippet
Change default for StatesBlockSize
Checklist
-
I have checked that my code follows the Octopus coding standards -
I have added tests for all the new features added in this request.