freeenergy/coulandvdwsequential_coul fails with separate PME rank and CUDA
## Summary To reproduce, in `tests/freeenergy/coulandvdwsequential_coul` directory: ``` $ mpirun -np 2 ../../../bin/gmx_mpi mdrun -notunepme -nb gpu -pme gpu -update cpu -ntomp 1 -npme 1 $ ../../../bin/gmx_mpi check -e ../../gromacs-regressiontests-release-2022/freeenergy/coulandvdwsequential_coul/reference_s.edr -e2 ener.edr -tol 0.001 -abstol 0.05 -lastener Potential #.... Reading energy frame 0 time 0.000 step 0: block[1][ 2] (-3.850891e+01 - -3.934799e+01) step 0: block[3][ 2] (1.540360e+01 - 1.573909e+01) step 0: block[5][ 2] (-7.701825e+00 - -7.869583e+00) Reading energy frame 20 time 0.020 step 20: block[1][ 2] (-5.535483e+01 - -4.876364e+01) step 20: block[3][ 2] (2.214181e+01 - 1.950552e+01) step 20: block[5][ 2] (-1.107116e+01 - -9.752747e+00) Reading energy frame 40 time 0.040 step 40: block[1][ 2] (-5.145402e+01 - -4.861850e+01) step 40: block[3][ 2] (2.058149e+01 - 1.944754e+01) step 40: block[5][ 2] (-1.029101e+01 - -9.723885e+00) Last energy frame read 40 time 0.040 #.... ``` There is a discrepancy already at step 0. Looking at the log, there is a big discrepancy in "dVcoul/dl", while other energy terms are pretty much the same. Initially reported in #4471 by @gaurav.garg. Reproduced on `dev-gpu04` (2xRTX2080Ti) with 9ff5c75c23a5bb1de3d3ec3e0e89e14d8f0a0e82 (2022.1), `cuda/11.5`, `gcc/11.2`, and `/nethome/aland/modules/modulefiles/mpich/4.0.0-cuda11.5`. - Fails with or without setting `GMX_ENABLE_DIRECT_GPU_COMM`. - Fails even if we use single GPU. - Fails with 2, 3, or 4 ranks. - Keeping PME on CPU or using single MPI rank fixes this issue. - Unlike #4471, is not affected by `compute-sanitizer`.
issue