Energy minimisation won't go to the minimum point when GPU is used.
Summary
When using GPU to energy minimise a system, I got a maximum force of 1.2266455e+04, when the same tpr file is used but with the -nb cpu
flag, the maximum force is reduced to 8.8434900e+02.
The simulation is just a simple compound being decoupled in water should should be quite a easy system.
GROMACS version
:-) GROMACS - gmx_mpi, 2022.2 (-:
Executable: /opt/MD-software/gmx-2022.2/bin/gmx_mpi
Data prefix: /opt/MD-software/gmx-2022.2
Working dir: /home/ec2-user/Minimisation/lambda_19
Command line:
gmx_mpi -quiet --version
GROMACS version: 2022.2
Precision: mixed
Memory model: 64 bit
MPI library: MPI (CUDA-aware)
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128)
GPU support: CUDA
SIMD instructions: AVX2_256
CPU FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128
GPU FFT library: cuFFT
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
C compiler: /opt/amazon/openmpi/bin/mpicc GNU 7.3.1
C compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -O3 -DNDEBUG
C++ compiler: /opt/amazon/openmpi/bin/mpicxx GNU 7.3.1
C++ compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2021 NVIDIA Corporation;Built on Mon_Oct_11_21:27:02_PDT_2021;Cuda compilation tools, release 11.4, V11.4.152;Build cuda_11.4.r11.4/compiler.30521435_0
CUDA compiler flags:-std=c++14;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-Wno-deprecated-gpu-targets;-gencode;arch=compute_53,code=sm_53;-gencode;arch=compute_80,code=sm_80;-use_fast_math;-D_FORCE_INLINES;-mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG
CUDA driver: 11.60
CUDA runtime: 11.40
Steps to reproduce
>>> gmx_mpi grompp -f gromacs.mdp -c gromacs.gro -p gromacs.top -o gromacs.tpr
>>> gmx_mpi mdrun -deffnm gromacs
Steepest Descents converged to machine precision in 74 steps,
but did not reach the requested Fmax < 1000.
Potential Energy = -1.0135666e+05
Maximum force = 1.2266455e+04 on atom 17
Norm of force = 2.1724870e+02
>>> gmx_mpi mdrun -deffnm gromacs -nb cpu
Steepest Descents converged to Fmax < 1000 in 228 steps
Potential Energy = -1.0515849e+05
Maximum force = 8.8434900e+02 on atom 17
Norm of force = 4.4004402e+01
What is the current bug behavior?
Using the GPU will give a high energy state which makes the system less stable.
What did you expect the correct behavior to be?
The GPU version and CPU version should give comparable results.