WIP: Reorder the communication strategies to improve performance
Description
Right now, all ranks are distributed in a 4D hypercube where the dimensions are domain, states, k points, and other (in this order). The call to MPI_Cart_create returns a cartesian communicator where usually all ranks are assigned in a row-major way. This means that the last index runs fastest (other) and the first index runs slowest (domain).
Usually, tasks are assigned to nodes in a block fashion, i.e. the first ranks (0 to x-1) go the first node, the ranks x to 2x-1 go to the second node etc. This means that the last parallelization strategy that is used is as compact as possible, whereas the first parallelization strategy is always spread out over several nodes. This means that currently the domain parallelization is not compact, i.e. the ranks sharing the domain for the same state and k point are usually spread out over several nodes.
The problem here is that communication inside a node is faster than across nodes because it can be handled in the shared memory as opposed to going through the interconnect. Thus, the communication that happens most often should be between ranks that are as compact as possible (to avoid going through the interconnect).
In octopus, the communication that happens most often is that using the domain parallelization strategy. Thus, it should be beneficial for most applications to have a compact arrangement of the tasks in the domain parallelization. This can be achieved by reordering the dimensions of the parallelization hypercube: by putting the domain parallelization as the last dimension, it will be placed as compact as possible on the nodes.
In this commit, the dimensions of the parallelization strategies are reversed to have the most used parallelization strategies placed as compact as possible.
Tests have shown up to 20% speed-up depending on the input data. For best performance, set ParDomains to the number of cores per node (or a fraction/multiple of this number, e.g. 1/2 or 1/4 or 2).
News snippet
Reorder communication strategies to improve performance
Checklist
-
I have checked that my code follows the Octopus coding standards
Merge request reports
Activity
added Core Optimization labels
Codecov Report
Merging #434 into develop will increase coverage by
0.01%
. The diff coverage is63.23%
.@@ Coverage Diff @@ ## develop #434 +/- ## ========================================== + Coverage 69.89% 69.9% +0.01% ========================================== Files 430 479 +49 Lines 115748 92888 -22860 ========================================== - Hits 80905 64937 -15968 + Misses 34843 27951 -6892
Impacted Files Coverage Δ src/system/eigen_rmmdiis.F90 0% <ø> (ø)
src/basic/distributed.F90 98.87% <ø> (-1.13%)
src/hamiltonian/hgh_projector.F90 97.14% <ø> (-2.86%)
src/math/lalg_adv.F90 24.78% <ø> (-2.36%)
src/grid/mesh_cube_map.F90 94.11% <ø> (+2.62%)
src/grid/mesh_function.F90 100% <ø> (ø)
src/basic/sort.F90 93.75% <ø> (+66.47%)
src/basic/global.F90 90.32% <ø> (-1.35%)
src/grid/stencil_cube.F90 45.31% <ø> (+2.75%)
src/math/solvers.F90 0% <ø> (ø)
... and 850 more
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 9fb13ea...2b68745. Read the comment docs.Edited by Codecovadded 636 commits
-
2a1c08a9...325d62e5 - 635 commits from branch
develop
- 6ea687a5 - Merge remote-tracking branch 'origin/develop' into reorder_communication
-
2a1c08a9...325d62e5 - 635 commits from branch
added 154 commits
-
6ea687a5...7893d7fa - 153 commits from branch
develop
- 5689301d - Merge remote-tracking branch 'origin/develop' into reorder_communication
-
6ea687a5...7893d7fa - 153 commits from branch
added 1015 commits
-
5689301d...9d17e053 - 1014 commits from branch
develop
- 2b687451 - Merge remote-tracking branch 'origin/develop' into reorder_communication
-
5689301d...9d17e053 - 1014 commits from branch
This MR has been superseded by !733 (merged), where the feature has been implemented in a different way and where it also can be disabled.
mentioned in merge request !733 (merged)