GPU PME Spread pipelining broken in SYCL
Summary
In SYCL, when doing charge spreading, the atom offsets are not applied. This leads to incorrect results when PME pipelining is used:
- PME is offloaded to GPU, and
- direct GPU communication is active, and
- several PP ranks are used.
This leads to incorrect results.
Exact steps to reproduce
On gpu12
:
$ module load cuda/11.7.1 cmake/3.24.2 ninja/1.10.0 openmpi/1.8.8-cuda6.5 clang/15.0.0 boost/1.75.0 /nethome/aland/modules/modulefiles/hipSYCL/0.9.4-cuda11.7.1
$ cmake ../.. -DCMAKE_CXX_COMPILER=clang++-15 -DCMAKE_C_COMPILER=clang-15 -DCMAKE_BUILD_TYPE=Release -DGMX_GPU=SYCL -DGMX_SYCL_HIPSYCL=ON -DHIPSYCL_TARGETS='cuda:sm_61,sm_70' -DGMX_MPI=ON
$ GMX_FORCE_GPU_AWARE_MPI=1 GMX_ENABLE_DIRECT_GPU_COMM=1 mpirun -np 3 gmx_mpi mdrun -nb gpu -pme gpu -update gpu -npme 1 -nsteps 1000 -ntomp 8 -pin on
The energy drift in the output file is around 1e-1 kJ/mol/ps for a 384k water box, way higher than normal, ~1e-4 kJ/mol/ps.
For developers: Why is this important?
Violating physics is not great.