Energy minimisation won't go to the minimum point when GPU is used.

When computed on the GPU vs CPU (or even different parallelisations on the CPU) the forces are different, so energy minimization following heuristics for step size, etc. can reach different minimum points. You can reduce such effects by taking smaller EM step sizes. If you really want to work hard to get to the bottom of a particular minimum, you need to use a double-precision build running on the CPU.

@mark.j.abraham Thanks for the explanation. In this case, the consequence is a bit more drastic than not reaching the very bottom of the energy basin as the maximum force has been increased from 2.4808354e+03 (step 0) to 1.2266455e+04, which is more than 10,000 kJ/mol and would just make the simulation explode.

changed the description

I did some testing with emstep emstep = 0.01 (default)

Steepest Descents converged to machine precision in 74 steps,
but did not reach the requested Fmax < 1000.
Potential Energy  = -1.0135659e+05
Maximum force     =  1.2266455e+04 on atom 17
Norm of force     =  2.1724870e+02

emstep = 0.005

Steepest Descents converged to machine precision in 39 steps,
but did not reach the requested Fmax < 1000.
Potential Energy  = -9.9866617e+04
Maximum force     =  7.8121484e+03 on atom 17
Norm of force     =  1.8172151e+02

It is only when emstep = 0.001 I got the em to converge.

Steepest Descents converged to Fmax < 1000 in 5 steps
Potential Energy  = -9.3671531e+04
Maximum force     =  9.2433673e+02 on atom 6458
Norm of force     =  2.1773625e+02

changed the description

assigned to @acmnpv

I'm looking into it

Running a quick check on top of the 2022 branch I can get the job to converge to the same precision as your CPU runs on the GPU as well in debug mode

On host paulbpc 2 GPUs selected for this run.
Mapping of GPU IDs to the 4 GPU tasks in the 4 ranks on this node:
  PP:0,PP:0,PP:1,PP:1
PP tasks will do (non-perturbed) short-ranged interactions on the GPU
PP task will update and constrain coordinates on the CPU

Steepest Descents:
   Tolerance (Fmax)   =  1.00000e+03
   Number of steps    =        10000

writing lowest energy coordinates.

Steepest Descents converged to Fmax < 1000 in 228 steps
Potential Energy  = -1.0515759e+05
Maximum force     =  8.8567792e+02 on atom 17
Norm of force     =  4.4011809e+01

Will check next if there is a difference between release and debug builds, but this points to some quirky stuff in calculating the stop criterium for the energy minimzation.

so, I can also get the correct behavior with a RelWithDebInfo build. But for now the builds have been with thread-MPI. Can you tell me the exact version of your MPI library and the node you are running things on?

I want to try to reproduce this exactly, to make sure if this is an actual issue or not

yes, testing a debug build with regular MPI brings back the problem. So this seems to be an actual issue with the force calculation when using CUDA and MPI, @pszilard @alangray3

correction, this happens when we use one MPI/thread-MPI thread (and the previous tests used several thread-MPI threads). I can reproduce the incorrect behavior when running the test case with single thread, and get the correct forces when using more than one thread for both MPI and thread-MPI.

Does the problem reproduce with gmx mdrun -nb cpu? If so then the problem is most likely with the logic after the force computation.

Does it reproduce with 2021? If not then the problem may be with the change to atom ordering that Berk introduced so that non-DD runs use the same atom ordering as DD uses.

Or this change aa66efd3

When running with -nb cpu -nt 1, the correct minimum is still reached.

          :-) GROMACS - gmx mdrun, 2022.3-dev-20220721-7d5b6342b4 (-:

Executable:   /mnt/build-dirs/gerrit/patch/build-gcc-9-release/bin/gmx
Data prefix:  /mnt/build-dirs/gerrit/patch (source tree)
Working dir:  /mnt/build-dirs/bugs/4533
Command line:
  gmx mdrun -s gromacs.tpr -nb cpu -nt 1


Back Off! I just backed up md.log to ./#md.log.2#
Reading file gromacs.tpr, VERSION 2022.2 (single precision)
Using 1 MPI thread
Using 1 OpenMP thread 


NOTE: Thread affinity was not set.

Back Off! I just backed up traj.trr to ./#traj.trr.2#

Back Off! I just backed up ener.edr to ./#ener.edr.2#

Steepest Descents:
   Tolerance (Fmax)   =  1.00000e+03
   Number of steps    =        10000

writing lowest energy coordinates.

Back Off! I just backed up confout.gro to ./#confout.gro.1#

Steepest Descents converged to Fmax < 1000 in 228 steps
Potential Energy  = -1.0516330e+05
Maximum force     =  8.8395599e+02 on atom 17
Norm of force     =  4.4002903e+01

GROMACS reminds you: "A C program is like a fast dance on a newly waxed dance floor by people carrying razors." (Waldi Ravens)

I'll check the changes you mentioned, if they don't explain the break I'll brute force the bisect

so, I can confirm that this worked in https://gitlab.com/gromacs/gromacs/-/tags/v2021, so it got broken somewhere on the way

@mark.j.abraham, it is indeed aa66efd3

@xiki-tempula could you upload the files needed to generate the TPR for your system? I need to test some older versions as @mark.j.abraham pointed out, and your TPR is too new for them

changed milestone to %2022.3

added Bug label

@berkhess

I get correct forces with GMX_DD_SINGLE_RANK=1, so something is broken in a non-DD code path.

And with GMX_DD_SINGLE_RANK=0 also using only CPU gives incorrect forces. So this is not GPU related.

ok, thanks for confirming. I'm still waiting on @xiki-tempula for the files so that I can start bisecting this

@acmnpv Sorry I wonder if you could generate the tpr file with files in the zip file that is in the post? I mean the Archive.zip?

hello @xiki-tempula, yes, this seems to work, sorry, didn't see all the files before

Thanks!

I found the issue and have a, not so elegant, fix. I'll write some documentation so it's clear how things work.

breaking change is aa66efd3, @berkhess

mentioned in commit 79835327

mentioned in merge request !2940 (merged)

mentioned in commit 1690fb66

mentioned in commit f0104346

Fixed by !2940 (merged)

closed

Energy minimisation won't go to the minimum point when GPU is used.

Designs

Child items ...

Activity