Energy minimisation won't go to the minimum point when GPU is used.
**Summary** When using GPU to energy minimise a system, I got a maximum force of 1.2266455e+04, when the same tpr file is used but with the `-nb cpu` flag, the maximum force is reduced to 8.8434900e+02. The simulation is just a simple compound being decoupled in water should should be quite a easy system. **GROMACS version** ``` :-) GROMACS - gmx_mpi, 2022.2 (-: Executable: /opt/MD-software/gmx-2022.2/bin/gmx_mpi Data prefix: /opt/MD-software/gmx-2022.2 Working dir: /home/ec2-user/Minimisation/lambda_19 Command line: gmx_mpi -quiet --version GROMACS version: 2022.2 Precision: mixed Memory model: 64 bit MPI library: MPI (CUDA-aware) OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128) GPU support: CUDA SIMD instructions: AVX2_256 CPU FFT library: fftw-3.3.8-sse2-avx-avx2-avx2_128 GPU FFT library: cuFFT RDTSCP usage: enabled TNG support: enabled Hwloc support: disabled Tracing support: disabled C compiler: /opt/amazon/openmpi/bin/mpicc GNU 7.3.1 C compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -O3 -DNDEBUG C++ compiler: /opt/amazon/openmpi/bin/mpicxx GNU 7.3.1 C++ compiler flags: -mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2021 NVIDIA Corporation;Built on Mon_Oct_11_21:27:02_PDT_2021;Cuda compilation tools, release 11.4, V11.4.152;Build cuda_11.4.r11.4/compiler.30521435_0 CUDA compiler flags:-std=c++14;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-Wno-deprecated-gpu-targets;-gencode;arch=compute_53,code=sm_53;-gencode;arch=compute_80,code=sm_80;-use_fast_math;-D_FORCE_INLINES;-mavx2 -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -fopenmp -O3 -DNDEBUG CUDA driver: 11.60 CUDA runtime: 11.40 ``` **Steps to reproduce** ``` >>> gmx_mpi grompp -f gromacs.mdp -c gromacs.gro -p gromacs.top -o gromacs.tpr >>> gmx_mpi mdrun -deffnm gromacs Steepest Descents converged to machine precision in 74 steps, but did not reach the requested Fmax < 1000. Potential Energy = -1.0135666e+05 Maximum force = 1.2266455e+04 on atom 17 Norm of force = 2.1724870e+02 >>> gmx_mpi mdrun -deffnm gromacs -nb cpu Steepest Descents converged to Fmax < 1000 in 228 steps Potential Energy = -1.0515849e+05 Maximum force = 8.8434900e+02 on atom 17 Norm of force = 4.4004402e+01 ``` **What is the current bug behavior?** Using the GPU will give a high energy state which makes the system less stable. **What did you expect the correct behavior to be?** The GPU version and CPU version should give comparable results. [Archive.zip](/uploads/2fafa702713723fb77e66c86071fa19b/Archive.zip)
issue