Hexagonal PBC and MPI - Redmine #2125
Archive from user: Bart Bruininks
Hey GROMACS people,
I was recently trying to increase the efficiency of my membrane-particle
fusion box, by changing the way PBC works to be more like a hexagon (10
10 10 0 0 5 0 10 5). I know this is not a perfect regular hexagon, for
the ratio of the box should not be a square, but something like
10\*3^0.5/2, but hey I guess it should also work. When I create a box
with these dimension I can run perfectly on a hyperthreaded 6 core.
However, when I move towards multiple nodes and start using MPI with
GROMACS 2016 stuff starts to go severely wrong in less than 100 steps. I
tried different versions of GROMACS (5.1.1 & 5.1.4 & 2016.1), but the
issue was always the same. I couldn’t say with 100% certainty that it is
indeed the MPI which causes it to go wrong, but whenever I start asking
for more cores than one node can provide the issue presents itself. I
myself am not a hard programmer and wouldn’t be able to solve the issue
nor find the exact thing which goes wrong, but I just would like to
point out the problem. I will attach the md.tpr file I have to run (It
is a MARTINI system, but that should also not matter too much, running
with -rdd 2.0 might be necessary though). Though a possibly a small bug,
I think it could be worth it to solve this, for these hexagonal boxes
are very nice for any particle migrating into a membrane.
Cheers and hopefully it can be resolved,
Bart
*(from redmine: issue id 2125, created on 2017-02-16 by gmxdefault, closed on 2018-01-09)*
* Changesets:
* Revision b1a0f28eb503c5e7974dc8c998797cb71c3f0b42 by Berk Hess on 2018-01-08T18:37:34Z:
```
Fix triclinic domain decomposition bug
With triclinic unit-cells with vectors a,b,c, the domain decomposition
would communicate an incorrect halo along dimension x when b[x]!=0
and vector c not parallel to the z-axis. The halo cut-off bound plane
was tilted incorrect along x/z with an error approximately
proportional to b[x]*(c[x] - b[x]*c[y]/b[y]).
When c[x] > b[x]*c[y]/b[y], the communicated halo was too small, which
could cause instabilities or silent errors.
When c[x] < b[x]*c[y]/b[y], the communicated halo was too large, which
could cause some communication overhead.
Fixes #2125
Change-Id: I2109542292beca5be26eddc262e0974c4ae825ea
```
* Revision 3a33815834794d83af60b0d16a5ef22015c4dfdf by Berk Hess on 2018-01-09T07:45:26Z:
```
Fix triclinic domain decomposition bug
With triclinic unit-cells with vectors a,b,c, the domain decomposition
would communicate an incorrect halo along dimension x when b[x]!=0
and vector c not parallel to the z-axis. The halo cut-off bound plane
was tilted incorrect along x/z with an error approximately
proportional to b[x]*(c[x] - b[x]*c[y]/b[y]).
When c[x] > b[x]*c[y]/b[y], the communicated halo was too small, which
could cause instabilities or silent errors.
When c[x] < b[x]*c[y]/b[y], the communicated halo was too large, which
could cause some communication overhead.
Fixes #2125
Change-Id: I2109542292beca5be26eddc262e0974c4ae825ea
(cherry picked from commit b1a0f28eb503c5e7974dc8c998797cb71c3f0b42)
```
* Uploads:
* [md.tpr](/uploads/c26dfc894cbc7050937e40bf39cb68c9/md.tpr) The tar which should reproduce the bug when used in combination with MPI
* [md-rdd-2.log](/uploads/33520beab3e59d3bca07ef0a8feef73e/md-rdd-2.log)
* [md-no-rdd.log](/uploads/bb13202cbe0a0c3fc0e37c7d8678a57d/md-no-rdd.log)
* [md-single-rank.log](/uploads/216ef5e8be262b38b9d603fca8831086/md-single-rank.log)
issue