Thread-MPI error in GROMACS-2018 - Redmine #2540
Archive from user: Siva Dasetty
Hello,
I have come across an error that causes GROMACS (2018/2018.1) to crash.
The message is:
“tMPI error: Receive buffer size too small for transmission (in valid
comm)
Aborted”
The error seems to only occur immediately following a LINCS or SETTLE
warning. The error is reproducible across different systems. A simple
example system is running an energy minimization on a box of 1000 rigid
TIP4P/Ice water molecules generated with gmx solvate. When SETTLE is
used as the constraint algorithm, there are several SETTLE warnings in
the early steps of the energy minimization, and GROMACS will crash with
the above error message. If I replace SETTLE with LINCS, GROMACS crashes
with the same error message following a LINCS warning. Other systems
that have produced this error are -OH terminated self assembled
monolayer surfaces (h-bonds constrained by LINCS), and mica surfaces
(h-bonds constrained by LINCS). Naturally, reducing -ntmpi to 1
eliminates the error for all cases.
The problem does appear to be hardware dependent. Specifically, the
tested node(s) on the cluster contains K20/K40 GPUs with Intel Xeon
E5-2680v3 processor (20/24 cores). I used GCC/5.4.0 and CUDA/8.0.44
compilers for installing GROMACS. An installation on my desktop machine
with with very similar options does not have the thread MPI error.
Example of procedure that causes error:
1. Node contains 24 cores and 2 K40 GPUs
gmx solvate -cs tip4p -o box.gro -box 3.2 3.2 3.2 -maxsol 1000
gmx grompp -f em.mdp -c box.gro -p tip4pice.top -o em
export OMP\_NUM\_THREADS=6
gmx mdrun -v -deffnm em -ntmpi 4 -ntomp 6 -pin on
Attached are the relevant topology (tip4pice.top), mdp (em.mdp), tpr
(em.tpr), and log (em.log) files. In addition tip4gro and box.gro files
are included.
Thanks in advance for any ideas as to what might be causing this
problem,
Siva Dasetty
*(from redmine: issue id 2540, created on 2018-06-01 by gmxdefault, closed on 2018-06-12)*
* Changesets:
* Revision dce23f771ac909e36815aeb76fe99f9a615bead3 by Berk Hess on 2018-06-06T22:33:48Z:
```
Fix MPI inconsistency in EM after constraint failure
Fixes issue #2540
Change-Id: Id18c17af82f80917388c11fc776b79bf4966a4ac
```
* Uploads:
* [tip4p.gro](/uploads/0c3409a4b4e37e61f5411120656d1e55/tip4p.gro) input .gro file used in gmx solvate.
* [box.gro](/uploads/9f819c8b396b28a120a754b45ddb5483/box.gro) .gro file obtained with gmx solvate.
* [em.log](/uploads/4171e2c92074b9f827adcd3f42bff385/em.log) .log file obtained during energy minimization.
* [tip4pice.top](/uploads/53cebdc706c1fc58e27bf189732df918/tip4pice.top) TIP4P/Ice topology file.
* [em.mdp](/uploads/da7e15035c40b21b905daea7c8d87e54/em.mdp) energy minimization parameter file.
* [em.tpr](/uploads/6890d4df04fac597bcc578a8480fb12f/em.tpr) .tpr file (energy minimization of TIP4P/Ice water)
issue