Seg Fault when running flat-bottom position restraints with MPI - Redmine #2095
Archive from user: Yunlong Liu I compiled gromacs (git master branch & 2016.1 release) with the following settings: \+ GCC 5.2.0 / GCC 4.9.2 + OpenMpi 2.0.1 / Mpich 3.2 + OpenMP enabled + FFTW 3.3.5 + AVX2\_256 + CUDA 7.5 + CUDA\_HOST\_COMPILER 4.9.2 In my position restraint topology files, I applied flat-bottom position restraints to three atoms. But when I started my gromacs job using mpirun -np 4 gmx_mpi mdrun ... The OpenMPI outputs a seg fault: [gpu072:50339] *** Process received signal *** [gpu072:50339] Signal: Segmentation fault (11) [gpu072:50339] Signal code: Address not mapped (1) [gpu072:50339] Failing at address: (nil) [gpu072:50338] *** Process received signal *** [gpu072:50338] Signal: Segmentation fault (11) [gpu072:50338] Signal code: Address not mapped (1) [gpu072:50338] Failing at address: (nil) [gpu072:50339] [ 0] /lib64/libpthread.so.0(+0xf790)[0x2aaaaf001790] [gpu072:50339] [ 1] [gpu072:50338] [ 0] /lib64/libpthread.so.0(+0xf790)[0x2aaaaf001790] [gpu072:50338] [ 1] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(+0x49662b)[0x2aaaab16362b] [gpu072:50339] [ 2] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(+0x49662b)[0x2aaaab16362b] [gpu072:50338] [ 2] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(+0x497fe2)[0x2aaaab164fe2] [gpu072:50339] [ 3] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(+0x497fe2)[0x2aaaab164fe2] [gpu072:50338] [ 3] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(_Z17dd_make_local_topP12gmx_domdec_tP18gmx_domdec_zones_tiPA3_fPfPiP10t_forcerecS4_P11gmx_vsite_tPK10gmx_mtop_tP14gmx_localtop_t+0x354)[0x2aaaab1654bd] [gpu072:50339] [ 4] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(_Z17dd_make_local_topP12gmx_domdec_tP18gmx_domdec_zones_tiPA3_fPfPiP10t_forcerecS4_P11gmx_vsite_tPK10gmx_mtop_tP14gmx_localtop_t+0x354)[0x2aaaab1654bd] [gpu072:50338] [ 4] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(_Z19dd_partition_systemP8_IO_FILElP9t_commreciiP7t_statePK10gmx_mtop_tPK10t_inputrecS4_PSt6vectorIN3gmx11BasicVectorIfEESaISE_EEP9t_mdatomsP14gmx_localtop_tP10t_forcerecP11gmx_vsite_tP10gmx_constrP6t_nrnbP13gmx_wallcyclei+0x1464)[0x2aaaab15c890] [gpu072:50339] [ 5] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(_Z19dd_partition_systemP8_IO_FILElP9t_commreciiP7t_statePK10gmx_mtop_tPK10t_inputrecS4_PSt6vectorIN3gmx11BasicVectorIfEESaISE_EEP9t_mdatomsP14gmx_localtop_tP10t_forcerecP11gmx_vsite_tP10gmx_constrP6t_nrnbP13gmx_wallcyclei+0x1464)[0x2aaaab15c890] [gpu072:50338] [ 5] gmx_mpi[0x429f6e] [gpu072:50339] [ 6] gmx_mpi[0x423b91] [gpu072:50339] [ 7] gmx_mpi[0x429f6e] [gpu072:50338] [ 6] gmx_mpi[0x423b91] [gpu072:50338] [ 7] gmx_mpi[0x428150] [gpu072:50339] [ 8] gmx_mpi[0x428150] [gpu072:50338] [ 8] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(+0x452977)[0x2aaaab11f977] [gpu072:50339] [ 9] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(+0x452977)[0x2aaaab11f977] [gpu072:50338] [ 9] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x38d)[0x2aaaab12142d] [gpu072:50339] [10] /home-4/yliu120@jhu.edu/opt2/lib64/libgromacs_mpi.so.3(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x38d)[0x2aaaab12142d] [gpu072:50338] [10] gmx_mpi[0x41941c] [gpu072:50338] [11] gmx_mpi[0x41941c] [gpu072:50339] [11] /lib64/libc.so.6(__libc_start_main+0xfd)[0x2aaaaf22dd5d] [gpu072:50338] [12] /lib64/libc.so.6(__libc_start_main+0xfd)[0x2aaaaf22dd5d] [gpu072:50339] [12] gmx_mpi[0x419299] [gpu072:50338] *** End of error message *** gmx_mpi[0x419299] [gpu072:50339] *** End of error message *** The OpenMPI’s debugger stacktrace shows that it is in the do\_make\_local\_top() function in the domdec.h outputs this segfault. However, when I removed the mpirun, in other words, when I ran the tpr using only one process with multiple threads, I didn’t get any seg fault. I attached the tpr file that can trigger this seg fault. *(from redmine: issue id 2095, created on 2016-12-29 by gmxdefault, closed on 2017-01-20)* * Relations: * duplicates #2236 * Changesets: * Revision 9a45db56461a639bf9b2e8fde360e66420a3e7f6 by Berk Hess on 2017-01-05T16:56:06Z: ``` Fix flat-bottom position restraints + DD + OpenMP When using flat-bottom position restraints with DD and OpenMP a (re)allocation was missing, causing a segv. Fixes #2095. Change-Id: I03af546a0b8d03a3d384d86a2582a67584e72d46 ``` * Uploads: * [step6.5_equilibration.tpr](/uploads/25f1e9def57be7c59c7783af136e63c3/step6.5_equilibration.tpr)
issue