EXX FFT decomposition
This merge request intends to fix two things:
-
-pd .true.
or.false.
was not honored inexx.f90
andexx_band.f90
, where pencil decomposition is always used. - A GPU to CPU memory transfer was at a wrong place in
vexx_gamma_gpu
.
For a number of test cases, this merge request leads to a performance improvement from 20% to 50%.
Example: SiC supercell containing 256 atoms.
-
Input files
-
Using two Cori Haswell nodes, using slab instead of pencil decomposition speeds up vexx by ~30%.
-
Using one Summit node, getting rid of the redundant memcpy improves the performance of vexx by ~20%. Using slab decomposition improves another ~20%. Overall the time spent on vexx is reduced from 715 to 446 seconds.
Results are not affected in any way as far as I can tell.