DLB issue with 2D/3D domain decomposition - Redmine #2830
Using the lignocelluse_rf benchmark (version from prace, md5sum
592f8fbcc77e7dfe221d6068b3c96b6b) with 2560 ranks the 2019 version fails
with:
step 200: The domain decomposition grid has shifted too much in the
Z-direction around cell 20 5 1.
With GCC 7.1 or ICC 18u2 on SKL or KNL. Also happens with GMX_DLB_BASED_ON_FLOPS=1 or -dds .75. Also whether using -dd 128 20 1 or -dd 40 8 8 (default) doesn’t matter. It is fine with fewer ranks or with 2018.3.
CC=gcc CXX=g cmake .. -DGMX_MPI=on
-DGMX_SIMD=AVX_512_KNL -DGMX_HWLOC=no
ibrun ~/gromacs/gcc7.3/bin/gmx_mpi mdrun -s lignocellulose-rf.tpr
-nsteps 3000 -noconfout
(from redmine: issue id 2830, created on 2019-01-13 by rolandschulz, closed on 2019-01-23)
- Changesets:
- Revision ca3e8f89 by Berk Hess on 2019-01-23T09:32:01Z:
Fix error with 2D/3D DLB
With 2D or 3D dynamic domain decomposition with dynamic load balancing,
mdrun would exit with a fatal error when a cell size was limited.
This bug was introduced in commit 49367d45.
Fixes #2830
Change-Id: If36fcc2ddbb45c0855c78a2767b1d8562584b76f