Error in MPI_Allreduce when applying AWH biasing and for multiple sharing biases - Redmine #2433
Archive from user: Viveca Lindahl
When running e.g.
mpirun -np 4 $gmx_mpi mdrun -v -multidir walker-1 walker-2
there is an
error from MPI_Allreduce:
*** an error occurred in mpi_allreduce
*** on communicator mpi_comm_world
*** mpi_err_comm: invalid communicator
on my machine, or when running on the cluster
<code class="text
Rank 8 [Thu Mar 1 16:03:40 2018] [c1-0c0s8n3] Fatal error in MPI_Allreduce: Invalid communicator, error stack:
MPI_Allreduce(1007): MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x22e08f0, count=337, MPI_INT, MPI_SUM, MPI_COMM_NULL) failed
MPI_Allreduce(926).: Null communicator
A tpr for this is attached. This is similar to
https://redmine.gromacs.org/issues/2403 in that there is no error when
there only one rank per directory multidir argument. I.e.
`mpirun -np 2 $gmx_mpi mdrun -v -multidir walker-1 walker-2` runs
error-free. I have all related fixes up until now applied.
*(from redmine: issue id 2433, created on 2018-03-01 by gmxdefault, closed on 2018-03-02)*
* Changesets:
* Revision cb1947ec92d56179ad3dacccdbc700f4d5a8fdd3 by Berk Hess on 2018-03-02T09:14:47Z:
Fix AWH bias-sharing with parallel simulation
Sharing the AWH bias over multiple simulations only worked when each simulation was running on a single MPI rank. Now parallel simulations also work.
Fixes #2433 (closed).
Change-Id: I71f9069a31b033151c772aac84c9912d91b213a1
* Uploads:
* [awh-share-on.tpr](/uploads/8e83905e585a5fffbbf9a5105b910109/awh-share-on.tpr)