Miscalculated LJ(SR) when running with GPU? - Redmine #3056
Collaborators in Mobley Lab found an issue where there appears to be miscalculation of the LJ (SR) with GPU. Not sure if in the most recent current code (a bit harder for me to test on GPU), they reported similar issues were found in 2019-beta
I’ve attached the input files for both GPU and CPU; as you can see by looking at the mdout.mdp they are processed the same.
At the initial time step, if you look at the energy.xvg files, all of the entries are roughly the same (presumably what one would expect from single precision machine precision) . . . except for LJ (SR).
I’m not an expert at the GPU code, so I did not try to investigate.
Entry CPU GPU
@ legend length 2
@ s0 legend “Bond” 511.556519 511.556427
@ s1 legend “Harmonic Pot.” 0.224793 0.224793
@ s2 legend “Angle” 1768.662231 1768.662842
@ s3 legend “Proper Dih.” 9718.273438 9718.266602
@ s4 legend “Improper Dih.” 0.405944 0.405945
@ s5 legend “Improper Dih.” 75.552689 75.552696
@ s6 legend “LJ-14” 2799.869141 2799.871338
@ s7 legend “Coulomb-14” 39090.589844 39090.554688
@ s8 legend “LJ (SR)” 99445.546875 197122.046875 <—-
@ s9 legend “Disper. corr.” –3431.618896 –3431.618896
@ s10 legend “Coulomb (SR)” –901030.000000 –901163.062500
@ s11 legend “Coul. reci” 1618.536377 1618.538086
Notes from the student:
The GPUs are Nvidia TitanX GPUs.
We have a Gromacs 2018-3 version and a 2019-beta version compiled for
that partition.
The previous test I ran with 2018-3, I tried earlier also 2019-beta but
if I remember correctly it gave me the same errors/issues.
I didn’t compile them, one of the students who did, sent me these
instructions he used (for 2019-beta)
cmake3 .. -DGMX_GPU=on -DGMX_SIMD=AVX2_256 \
-DCMAKE_INSTALL_PREFIX=$TARGET \
-DGMX_BUILD_OWN_FFTW=ON \
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-9.2 \
-DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g \
-DREGRESSIONTEST_DOWNLOAD=OFF
(from redmine: issue id 3056, created on 2019-08-11 by mrshirts, closed on 2019-09-04)
- Changesets:
- Revision a5409af7 by Berk Hess on 2019-09-02T12:10:31Z:
Fix incorrect rvdw on GPU with rvdw<rcoulomb
When rvdw < rcoulomb was set in the mdp file, rvdw would initially
be set to rcoulomb on the GPU. With default mdrun settings,
the correct rvdw would be set after 2*nstlist steps by PME tuning.
TODO: Add an mdrun test case with rvdw<rcoulomb, refs #3062
Fixes #3056
Change-Id: I7243f27e75e46adedd668822dcd6b9045ef98a3f
- Uploads:
- inputs.zip Inputs for both CPU and GPU
- outputs.zip Outputs for both CPU and GPU