Allow MPI communicator to be provided by client.
This task is essentially to finish GMX11 from #2585 (closed): i.e. allow client software to use MPI at a high level whether or not libgromacs is built for MPI parallelism.
The client is then able to provide the library with a sub-communicator, restricting a given SimulationContext to a subset of the run time MPI ranks. For example: allow a trajectory ensemble to be managed by client code, in which 8 simulations each run on 4 nodes in a 32-node HPC job.
The MPI communicator is already provided to Mdrunner via the SimulationContext. We will extend the MpiContextManager RAII helper to provide scoped access to an appropriate communicator.
However, there are implicitly two versions of SimulationContext, depending on whether gmxmpi.h references an MPI implementation or tMPI. (In the latter case, the communications have not been initialized at the time that SimulationContext has been initialized, but we defer reevaluation of these semantics to followup issues.) Importantly, the creator of the SimulationContext may be exposed both to tMPI GROMACS internals and an MPI implementation in the client environment.
To reconcile this, we reaffirm that tMPI is an internal detail, and provide some translation layering to insulate client-provided MPI_Comm
from translation units that might have tMPI symbols.
Also ref #2395 and #3307 (closed)
(Discussed for a while, but reaffirmed for the 2021 release roadmap at planning meeting in August 2020 as a deliverable for 2021 release.)
Tasks
- !474 (merged) Describe the design and provide the public headers.
- !605 (merged) Move the MpiContextManager instance from "Session" to "Context" scope.
- !606 (merged) Give MpiContextManager the additional responsibility of providing the communicator for the SimulationContext.
- !606 (merged) Allow MPI-enabled clients to provide a communicator to MPI-enabled GROMACS.
- !607 (merged) Allow the new interface to support MPI-enabled clients for both tMPI and MPI GROMACS as abstractly as possible.
Follow-up
Deferred
Future work can further decouple the multisim code from the libgromacs internals, since Mdrunner can be sensibly simplified by separating concerns regarding concurrently executing simulations.