C++ API for simulation input and output
Functionalities such as (hybrid) Monte Carlo, simulation replicas, replica exchange, and input preparation/manipulation share a need for API access to simulation inputs and outputs.
Additionally, efforts to limit the responsibilities of individual tools (and separate out convenience options) warrant light-weight ways to connect tools, including ways to filter or manipulate trajectory output before it hits the filesystem. See, for instance, #3286.
This issue is intended to collect a roadmap for design and development.
Related efforts include
- encapsulation, abstraction, and interface development under nb-lib
- restructuring of simulator launch, collaborations, and data structures related to expansion of the ModularSimulator (links from Pascal? Paul?)
- expansion of the MdModules framework (links from Christian? Others?)
- evolution of modular input handling
- evolution of the checkpoint facilities
- clarifying simulator program state and invariants (#3325, #2375)
- Confirm test coverage for parallel runs.
- Confirm test coverage for checkpointed "multisim" multi-simulation runs.
Confirm test coverage for start when
-cpiis specified but checkpoint file does not (yet) exist.
- Confirm test coverage for out-of-sync multisim case where some but not all checkpoint files are present.
To clarify the scope of this issue, define some use cases.
Features and tools enabled by the API functionality described in this issue.
Ensemble simulation / multi-sim
Temperature replica exchange
Hamiltonian replica exchange
Monte Carlo rejection of a trajectory segment
convert-tpr / gmxapi.modify_input
Filesystem-decoupled input preparation and simulation
Filesystem-decoupled simulation output handling
API use cases driving features within this issue scope, supporting the scenarios expected within the application use cases above.
Obtain a reference to the output of a simulation segment.
Produce input for a simulation segment from the output of a simulation segment.
Obtain a modified SimulationInput from an “editing” operation.
Compose a SimulationInput
Decompose a SimulationInput (topology, microstate, simulation parameters, metadata, others?)
Fingerprint a SimulationInput (identify the trajectory of which it is a part and the segment that will be produced (uniquely to the point of reproducibility and/or scientific relevance))
Library-internal use cases included by the above API implementation scenarios, or connected to the accompanying (re)factoring.
Apply SimulationInput to consuming modules.
Initialize volatile data (internal state) from the (immutable) record of input.
Coordinate a Memento, or publish light-weight (opaque) handle to simulator output or checkpoint (don’t bake in details of data locality or structure)
Interactions between GROMACS internal modules and the new API facilities or supporting infrastructure.
(Re)initialize internal state.
Dump internal state.
Confirm input validity.
Register information or collaboration dependencies.
Register, publish, or be able to describe available outputs.
Distinguish between (immutable) input and (mutable) program state (clarify stages of initialization, reform inputrec use cases).
Clarify the information hierarchy represented by SimulationInput (and SimulationOutput)
Maximize reusability of the MD runner
- allow SimulationInput to be reapplied in a process lifetime
- understand reusable resources or data structures that do not need reinitialization
Define SimulationState encapsulation, or coordinate with its road ma
To further clarify the scope of this issue, identify related tasks that should have a more explicit road map, but which are (currently) considered beyond the scope of this feature topic.
- Decouple Mdrunner collaborations from assumptions of file-based I/O (Remove the ArrayRef from gmx::Mdrunner.)
- Modernize/unify run time simulation options handling (#2877)
- clean up the mdrun call hierarchy and program flow (input aggregation, acquisition of run time resources, component initialization and binding, creation protocols, “runner” versus “simulator”)
- Decouple Mdrunner from membed and essential-dynamics implementation details.
- Logging abstraction (#2999)
Use the new SimulationInput abstraction as the focal point for restructuring simulation setup and simulator initialization in flexible API-friendly ways. Work towards clearer representations of prescribed work while decoupling from specific file formats. Allow lighter weight representations and transformations of simulation input for ensemble methods and other many-simulation workflows.
t_inputrecinto the representation of (immutable) input data versus the remaining (mutable) working data.
Provide a way to initialize
Read files during SimulationInput construction and store in serialization memory buffer or copyable versions of
t_state. Let the existence of TPR and checkpoint input files be client-level concerns, and encapsulate their handling from the rest of the mdrun call stack.
- Create RAII holders for shared data that does not already have a clear owner at the Mdrunner::mdrunner level.
- Apply SimulationInput directly to its consumers. Remove legacy structures from the Mdrunner::mdrunner() level that are used only to ferry data between SimulationInput and its consumers.
- Allow the SimulationInput to manage distributed data. (Encapsulate the data locality management.)
- Identify and implement some minor transformations that are possible to SimulationInput without re-preprocessing.
- Allow SimulationInput to be used to generate the filesystem representation of simulation input data (TPR+CPT or some new format).
gromppproduce complete simulation input (including the structures that are not initialized until the checkpointing has been set up).
gromppfrom the file format(s), and just produce a SimulationInput object.
- Reimplement gmxapi.simulation operations in terms of SimulationInput.
- Reimplement gmxapi.modify_input and convert-tpr in terms of SimulationInput. (also ref #3295)
- Allow the Simulator (or its output object) to produce a SimulationInput.
- Converge SimulationInput development with hybrid MD/MC development and Nb-lib input handling.
- more (please contribute)
A complete concept of the hierarchy of information comprising SimulationInput should be explored, but is neither necessary nor likely for near-term efforts.
Criteria for completion
This issue may remain open as long as it is a useful road map, but can likely be considered “resolved” when the API use cases to support the targeted applications are well understood, and either implemented or independently tracked on another road ma
(from redmine: issue id 3379, created on 2020-02-13 by eirrgang)