GPU force buffer ops, reduction and comms on virial steps
-
Currently, Force buffer ops, force reduction and halo exchange fall back to CPU on virial steps. PME-PP comms are still active, but go from PME GPU to PP CPU.
-
Virial steps are irregular, so there is insignificant overall performance benefit to having GPU force buffer ops, reduction and comms on virial steps.
-
But code quality improvements related to reduced complexity of conditionals in do_force()/do_md() could be possible, due to more uniformity across steps.
-
GPU force Buffer ops on virial steps are relatively straightforward, but don't offer much improvement on their own
- see patch at https://gerrit.gromacs.org/c/gromacs/+/15960
-
GPU force reduction on virial steps is much more complex
- On non-virial steps, there is only one active force buffer and this is reduced with both the NB and PME forces.
- On virial steps, there exist 2 separate force buffers forceOut.forceWithShiftForces().force() and forceOut.forceWithVirial().force_. The former is reduced with the NB force, and the latter with the PME force.
- We currently have no concept of a second force buffer in the Stata Propagator or the GPU force reduction, and no concept of these distinct reductions.
- It seems that the additional complexity associated with introducing this would outweigh any reduced complexity in conditionals from having the GPU codepaths active on virial steps.
-
GPU force halo exchange on virial steps is relatively straightforward, but not clear there is any benefit to this without GPU force reduction.
-
My feeling is that the simplest solution is to keep force buffer ops, reductions and halo exchange on CPU for virial steps, as we have at the moment.