incorrect results with Ubuntu 18.04 / glibc 2.27 (?) and >20 threads - Redmine #2762
Archive from user: Dmytro Kovalskyy Hi, Setup: Dual Xeon NVODOA P5000 Ubuntu 18.04 4.15.0-36-generic Gromacs 2018-3, CUDA 10 Gromacs is compiled with cmake .. -DGMX\_GPU=ON -DGMX\_USE\_NVML=ON The problem: MD with GPU crashes with following error: Program: gmx mdrun, version 2018.3 Source file: src/gromacs/gpu\_utils/cudautils.cu (line 110) Fatal error: HtoD cudaMemcpyAsync failed: invalid argument When gmx mdrun -deffnm md200ns -v -nb cpu or gmx mdrun -deffnm md200ns -v -nb gpu -pme cpu then MD goes as expected. When Gromacs is compiled with Debug option, then MD run goes as expected with no additional options i.e. gmx mdrun -deffnm md200ns -v 1 GPU auto-selected for this run. Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node: PP:0,PME:0 TPR and log files are attached. thank you *(from redmine: issue id 2762, created on 2018-11-15 by gmxdefault, closed on 2019-06-03)* * Changesets: * Revision 9f45a4be4b5e24f49c4c1ce8db8144b21631a891 by Berk Hess on 2019-05-27T09:04:23Z: ``` Work around gcc 7 avx512 bug Due to an avx512 loop vectorization bug in gcc 7, many non-bonded interactions could be ignored when running with more than 16 OpenMP threads. Fixes #2762 Change-Id: I3e03fde7114542bbd43069166a6c74937fc0f986 ``` * Uploads: * [md200ns.tpr](/uploads/fe4faafc09bf79f4f417ceeccadc4678/md200ns.tpr) * [md200ns.log](/uploads/14045821a635ffdc8e9ccda3bcbc2934/md200ns.log) * [quick.out](/uploads/1a89037c1f76dc3b4563c314af8c0c46/quick.out) stdout+stderr * [quick.log](/uploads/e6613ff283b5e0836d8b1df78980c05e/quick.log) log running on gpu20, built on bs-gpu01 * [quick.tpr](/uploads/f0f0eb753cc31c46558019a78effe8b5/quick.tpr) same tpr, just with only 1000 steps * [mdrun.out](/uploads/a5953b75b1eb25e5488cc3316a06dd81/mdrun.out) * [md.log](/uploads/6087fccf02306deab431b1e222358478/md.log) * [nbnxn_sum.cpp](/uploads/cf6d0ce420e13911f7a6b9aa8a5db4ea/nbnxn_sum.cpp) * [nbnxn_sum.s](/uploads/c63ddec7375ee491433bbf4ebfa23262/nbnxn_sum.s) * [nbnxn_sum.s.avx2_256](/uploads/f9a72aa0610adb6a092a050065710b9c/nbnxn_sum.s.avx2_256) * [nbnxn_search-nobug.asm](/uploads/51c84869fd97def3b96140b31be3ff4d/nbnxn_search-nobug.asm) * [nbnxn_search-bug.asm](/uploads/a6c1f241a3871a5eecb3b99ebb181aae/nbnxn_search-bug.asm) * [nbnxn_sum.cpp.o](/uploads/32565106cb57485b44049767feeccc65/nbnxn_sum.cpp.o)
issue