mdrun gives different distance restraint potential energies depending on the number of openMP threads - Redmine #2029

Archive from user: Tim Flint

Hi,

I got spuriously large distance restraint potential energy when running mdrun with multiple OpenMP threads and the system adopts a weird configuration sooner or later (depending on the restraint force constant) while the same simulation (same tpr file) runs fine. Here are the Dis. Res. related mdp option I used (see the uploaded tpr file for details):

disre = simple
disre-weighting = conservative
disre-tau = 0
disre-fc = 1e-5
nstdisreout = 10000

and I tried mdrun using a single MPI rank and giving 1, 2, 4 to -ntomp and only -ntomp = 1 gave sane results. The log file from the 1 OpenMP thread looks like:

Step Time Lambda
0 0.00000 0.00000

Energies (kJ/mol)
Bond Angle G96Angle LJ (SR) Coulomb (SR)
2.16662e+01 0.00000e+00 1.96570e+03 9.15548e+03 0.00000e+00
Dis. Rest. D.R.Viol. (nm) Potential Kinetic En. Total Energy
5.56697e+04 2.90805e+04 6.68126e+04 1.57240e+04 8.25365e+04
Temperature Pressure (bar)
2.99970e+02 –6.28303e-07

while that from 2 OpenMP threads looks like:

Step Time Lambda
0 0.00000 0.00000

Energies (kJ/mol)
Bond Angle G96Angle LJ (SR) Coulomb (SR)
2.16662e+01 0.00000e+00 1.96571e+03 9.15548e+03 0.00000e+00
Dis. Rest. D.R.Viol. (nm) Potential Kinetic En. Total Energy
1.96610e+07 4.12632e+05 1.96722e+07 1.57284e+04 1.96879e+07
Temperature Pressure (bar)
3.00055e+02 5.46943e-06

It looks like all the other potential energy terms are the same independent of the number of OpenMP threads except for Dis. Rest.. Can anyone tell me if I’m hitting a bug in GROMACS or if there’s something I missed in setting up distance restraint?

Thanks,
Tim

(from redmine: issue id 2029, created on 2016-08-09 by gmxdefault, closed on 2016-10-31)

Made distance restraints work with threads and DD

The NMR distance restraints use several buffers for summing distances
that were indexed based on the index of the thread+domain local ilist
force atoms. This gives incorrect results with OpenMP and/or domain
decomposition. Using the type index for the restraint and a domain-
local, but not thread-local index for the pair resolves these issues.
The are now only two limitations left:
* Time-averaged restraint don't work with DD.
* Multiple copies of molecules in the same system without ensemble
  averaging does not work with DD.

Fixes #1117.
Fixes #1989.
Fixes #2029.

Change-Id: Ic51230aa19a4640caca29a7d7ff471e30a3d9f09