Make gmxapi.operation compatible with MPI-based ensemble management.
Initial execution management code in
gmxapi.operation
are very minimal. Ensemble awareness is minimal and
only serial execution of operations for each ensemble members is
supported.
Note that gmxapi.simulation.mdrun
uses the legacy
gmxapi.simulation.context
module for parallel execution of ensemble
simulations.
While mpi4py
is necessary for gmxapi.simulation.context
, gmxapi.operation
was not aware of it.
This meant that mixing gmxapi.simulation
operations with, say, gmxapi.commandline
operations may not behave as intended for ensembles (non-gmxapi.simulation
work would be duplicated across ranks.)
We will need to confirm that both naively parallel and broadcast data flow work correctly across ranks.
Updates
Some additional bugs were identified and fixed in the submitted patch.
- Ensemble width for subgraph variables and their updates is now clarified when the subgraph instance is built.
-
while_loop
explicitly behaves as an AllGather of the results it wraps, and correctly represents ensemble outputs as having an array dimension. - Naming of nodes in the workflow graph (operation instance identifiers) is clarified, normalized, and made consistent across the ensemble for operations in
gmxapi.simulation
module. - A distinction is clarified between work that is and is not duplicated on each MPI rank. Generally, non-MPI-aware tasks (that are not sufficiently integrated with gmxapi) should only be executed from a single process, whereas other tasks must be executed on all ranks (such as the subgraph+while_loop logic, and the MPI-aware ensemble mdrun task). A new
allow_duplicate
annotation determines whether tasks should be launched on all ranks, or whether they should run on a single rank and share their results. (The implementation is minimal and naive, with much room for future optimization, but the solution seems appropriate for now.) - For subgraphs and ensemble subgraphs, we recognized that Futures provided by the user to the subgraph should not be modified ("reset") during loop execution. In addition to much more rigorous handling and repackaging of inputs to subgraph variables, we introduce the ability to block the propagation of
reset()
to data providers.
Examples
Minimal tests of the necessary functionality are at https://gitlab.com/gromacs/gromacs/-/blob/8342ea76b1065524ad20768742a1bc859c84f4e4/python_packaging/src/test/test_subgraph.py#L98
A richer example is at https://github.com/kassonlab/gmxapi-tutorials/blob/main/examples/fs-peptide.py
Deferred
In conjunction with supporting parallel execution, we should expand the interface between Context and operations to describe coscheduling requirements and data locality issues.
Additionally, we need to improve the interaction between Contexts, such as with subscribability of Futures.
This is also related to data shaping issues (#2994 (closed)) and management of working files.
Some additional operations (e.g. scatter(), gather() and reduce()) may be needed. update: some data shape transformation logic has been formalized, and two additional call-back facilities have been introduced to allow ResourceManagers to send and receive results between ranks when allow_duplate=False