Major regression in v2025 for multi-GPU with thread-MPI
GROMACS v2025 is no longer respecting the GMX_ENABLE_DIRECT_GPU_COMM environment variable for PME-PP communications with thread-MPI on NVIDIA GPUs due to https://gitlab.com/gromacs/gromacs/-/merge_requests/4946. This is causing a major regression for many cases, e.g. v2025.0 is 2X slower than v2024.5 for ADHD on 4xH100. This will cause issues with internal NVIDIA testing and for the many NVIDIA GPU users who use thread-MPI in their workflows. The issue can be fixed by re-allowing (non-default) GPU direct PME-PP communications via the above environment variable, such that release-2025 behaves the same as previous releases for thread-MPI.
issue