incorrect results with Ubuntu 18.04 / glibc 2.27 (?) and >20 threads - Redmine #2762
Archive from user: Dmytro Kovalskyy
Hi,
Setup:
Dual Xeon NVODOA P5000
Ubuntu 18.04 4.15.0-36-generic Gromacs 2018-3, CUDA 10
Gromacs is compiled with
cmake .. -DGMX\_GPU=ON -DGMX\_USE\_NVML=ON
The problem:
MD with GPU crashes with following error:
Program: gmx mdrun, version 2018.3
Source file: src/gromacs/gpu\_utils/cudautils.cu (line 110)
Fatal error:
HtoD cudaMemcpyAsync failed: invalid argument
When gmx mdrun -deffnm md200ns -v -nb cpu
or
gmx mdrun -deffnm md200ns -v -nb gpu -pme cpu
then MD goes as expected.
When Gromacs is compiled with Debug option, then MD run goes as expected
with no additional options
i.e.
gmx mdrun -deffnm md200ns -v
1 GPU auto-selected for this run.
Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node:
PP:0,PME:0
TPR and log files are attached.
thank you
*(from redmine: issue id 2762, created on 2018-11-15 by gmxdefault, closed on 2019-06-03)*
* Changesets:
* Revision 9f45a4be4b5e24f49c4c1ce8db8144b21631a891 by Berk Hess on 2019-05-27T09:04:23Z:
```
Work around gcc 7 avx512 bug
Due to an avx512 loop vectorization bug in gcc 7, many non-bonded
interactions could be ignored when running with more than 16 OpenMP
threads.
Fixes #2762
Change-Id: I3e03fde7114542bbd43069166a6c74937fc0f986
```
* Uploads:
* [md200ns.tpr](/uploads/fe4faafc09bf79f4f417ceeccadc4678/md200ns.tpr)
* [md200ns.log](/uploads/14045821a635ffdc8e9ccda3bcbc2934/md200ns.log)
* [quick.out](/uploads/1a89037c1f76dc3b4563c314af8c0c46/quick.out) stdout+stderr
* [quick.log](/uploads/e6613ff283b5e0836d8b1df78980c05e/quick.log) log running on gpu20, built on bs-gpu01
* [quick.tpr](/uploads/f0f0eb753cc31c46558019a78effe8b5/quick.tpr) same tpr, just with only 1000 steps
* [mdrun.out](/uploads/a5953b75b1eb25e5488cc3316a06dd81/mdrun.out)
* [md.log](/uploads/6087fccf02306deab431b1e222358478/md.log)
* [nbnxn_sum.cpp](/uploads/cf6d0ce420e13911f7a6b9aa8a5db4ea/nbnxn_sum.cpp)
* [nbnxn_sum.s](/uploads/c63ddec7375ee491433bbf4ebfa23262/nbnxn_sum.s)
* [nbnxn_sum.s.avx2_256](/uploads/f9a72aa0610adb6a092a050065710b9c/nbnxn_sum.s.avx2_256)
* [nbnxn_search-nobug.asm](/uploads/51c84869fd97def3b96140b31be3ff4d/nbnxn_search-nobug.asm)
* [nbnxn_search-bug.asm](/uploads/a6c1f241a3871a5eecb3b99ebb181aae/nbnxn_search-bug.asm)
* [nbnxn_sum.cpp.o](/uploads/32565106cb57485b44049767feeccc65/nbnxn_sum.cpp.o)
issue