Hello, could you provide a shorter example, as well as the details on
how you ran the simulation (e.g. number of ranks, GPU usage, …).
I’ll try to reproduce this in the meantime, but more information would
definitely hel
Thank you!
Thank you for quick response. Unfortunately, halved example doesn’t
crash. I don’t use MPI and/or GPU and this is reproducible on multiple
machines, one example:
Running on 1 node with total 8 cores, 8 logical cores, 0 compatible
GPUs
Hardware detected:
CPU info:
Vendor: Intel
Brand: Intel® Xeon® CPU E5-2630 v3 @ 2.40GHz
SIMD instructions most likely to fit this hardware: AVX2_256
SIMD instructions selected at GROMACS compile time: AVX2_256
(from redmine: written on 2017-12-06 by vedranmiletic)
Some more questions here. Does the bug happen with different
combinations of integrator/time step/thermostat?
Also, could you provide me with the files needed to generate the tpr
file? So I can test the different combinations?
Thanks!
This could indeed be an integer overflowing, in the pair list.
So likely the system will run with domain decomposition, which is likely
also faster because ordering of particles improves cache hits. Could you
try with -ntmpi 2? You can also try -ntmpi 4 and 8 and see what is
fastest.
I ran -mtpi 2 and 4 myself. All crash with an atom flying away:
Atom 3595214 moved more than the distance allowed by the domain
decomposition (125.000000) in direction X
distance out of cell 403.997559
New coordinates: 528.998 495.989 98.298
CPU runs hang at step 40, the second domain decomposition ste
So my first guess is that your setup is unstable.
Have you even looked at the energy output at step 0? I get:
Large VCM (group rest): 505.20956, –0.00001, –0.00002, Temp-cm: 1.657
37e+07
Energies (kJ/mol)
Bond Angle LJ (SR) Coulomb (SR) Potential
9.91842e+05 8.50307e+06 1.09500e+19 0.00000e+00 1.09500e+19
Kinetic En. Total Energy Temperature Pressure (bar)
2.22746e+35 2.22746e+35 3.19730e+30 1.97268e+28
Reopened because I uploaded a “fix” that checks for large energies and
step 0 and that gives a fatal error on this system instead of an
assertion failure.
Gerrit received a related patchset ‘1’ for Issue #2333.
Uploader: Berk Hess (hess@kth.se)
Change-Id:
gromacsrelease-2018I6e8aa1fac3a3c9a358b4046de5c8a3547ae14b15
Gerrit URL: https://gerrit.gromacs.org/7325
(from redmine: written on 2017-12-11 by gmxdefault)