C++ API for simulation input and output

Functionalities such as (hybrid) Monte Carlo, simulation replicas, replica exchange, and input preparation/manipulation share a need for API access to simulation inputs and outputs.

Additionally, efforts to limit the responsibilities of individual tools (and separate out convenience options) warrant light-weight ways to connect tools, including ways to filter or manipulate trajectory output before it hits the filesystem. See, for instance, #3286 (closed).

This issue is intended to collect a roadmap for design and development.

Related efforts include

encapsulation, abstraction, and interface development under nb-lib
restructuring of simulator launch, collaborations, and data structures related to expansion of the ModularSimulator (links from Pascal? Paul?)
expansion of the MdModules framework (links from Christian? Others?)
evolution of modular input handling
evolution of the checkpoint facilities
clarifying simulator program state and invariants (#3325 (closed), #2375 (closed))

Preliminary tasks

Confirm test coverage for parallel runs.
Confirm test coverage for checkpointed "multisim" multi-simulation runs.
Confirm test coverage for start when -cpi is specified but checkpoint file does not (yet) exist.
Confirm test coverage for out-of-sync multisim case where some but not all checkpoint files are present.

Use cases

To clarify the scope of this issue, define some use cases.

Application level

Features and tools enabled by the API functionality described in this issue.

Ensemble simulation / multi-sim

Temperature replica exchange

Hamiltonian replica exchange

Monte Carlo rejection of a trajectory segment

convert-tpr / gmxapi.modify_input

`gmx dump`

grompp

nb-lib translation

Filesystem-decoupled input preparation and simulation

Filesystem-decoupled simulation output handling

API level

API use cases driving features within this issue scope, supporting the scenarios expected within the application use cases above.

Obtain a reference to the output of a simulation segment.

Produce input for a simulation segment from the output of a simulation segment.

Obtain a modified SimulationInput from an “editing” operation.

Compose a SimulationInput

Decompose a SimulationInput (topology, microstate, simulation parameters, metadata, others?)

Fingerprint a SimulationInput (identify the trajectory of which it is a part and the segment that will be produced (uniquely to the point of reproducibility and/or scientific relevance))

Library level

Library-internal use cases included by the above API implementation scenarios, or connected to the accompanying (re)factoring.

Apply SimulationInput to consuming modules.

Initialize volatile data (internal state) from the (immutable) record of input.

Coordinate a Memento, or publish light-weight (opaque) handle to simulator output or checkpoint (don’t bake in details of data locality or structure)

Module level

Interactions between GROMACS internal modules and the new API facilities or supporting infrastructure.

(Re)initialize internal state.

Dump internal state.

Confirm input validity.

Register information or collaboration dependencies.

Register, publish, or be able to describe available outputs.

Additional goals

Distinguish between (immutable) input and (mutable) program state (clarify stages of initialization, reform inputrec use cases).

Clarify the information hierarchy represented by SimulationInput (and SimulationOutput)

Maximize reusability of the MD runner

allow SimulationInput to be reapplied in a process lifetime
understand reusable resources or data structures that do not need reinitialization

Define SimulationState encapsulation, or coordinate with its road ma

Deferred

To further clarify the scope of this issue, identify related tasks that should have a more explicit road map, but which are (currently) considered beyond the scope of this feature topic.

Decouple Mdrunner collaborations from assumptions of file-based I/O (Remove the ArrayRef<const t_filenm> from gmx::Mdrunner.)
Modernize/unify run time simulation options handling (#2877 (closed))
clean up the mdrun call hierarchy and program flow (input aggregation, acquisition of run time resources, component initialization and binding, creation protocols, “runner” versus “simulator”)
Decouple Mdrunner from membed and essential-dynamics implementation details.
Logging abstraction (#2999 (closed))

Tasks

Use the new SimulationInput abstraction as the focal point for restructuring simulation setup and simulator initialization in flexible API-friendly ways. Work towards clearer representations of prescribed work while decoupling from specific file formats. Allow lighter weight representations and transformations of simulation input for ensemble methods and other many-simulation workflows.

A complete concept of the hierarchy of information comprising SimulationInput should be explored, but is neither necessary nor likely for near-term efforts.

Criteria for completion

This issue may remain open as long as it is a useful road map, but can likely be considered “resolved” when the API use cases to support the targeted applications are well understood, and either implemented or independently tracked on another road ma

(from redmine: issue id 3379, created on 2020-02-13 by eirrgang)

Relations:
- relates #3286 (closed)
- relates #3285 (closed)
- relates #3433 (closed)
- relates #3422 (closed)
- child #3374 (closed)
- child #3439 (closed)

Edited Mar 08, 2022 by M. Eric Irrgang