Clarify execution phases for MD simulation - Redmine #2375
In support of a roadmap to an API layer between the user interface and the MD simulation machinery, one early set of tasks is to tighten up various changes of execution state and clarify the seams for client-library interaction.
Basic features of API-driven simulation are to
- allow calling code (client) to provide or modify initial state
- provide or modify simulation parameters and runtime parameters
- directly access trajectory data during or after a simulation without expensive filesystem I/O
* API abstraction for trajectory output and/or checkpointing
* API-provided MDModule or call-back access to system snapshots
* abstract or at least convertible representation of initial and final
simulation state before and after performing the specified MD
integration.
The above can be extracted to a parent issue at some point, but the
current issue is intended to address the first bullet: define a roadmap
to allow API client code to provide and/or clearly understand the
initial state of a simulation. It is also an excuse to start clarifying
phases of program execution such that non-user-interface aspects of
mdrun
can be compartmentalized into the library, allowing consistent
semantics between CLI and other API-driven work. This probably involves
work to encapsulate or reconsider the ownership relationships of things
like modules, command-line options, and initialization defaults.
To start discussion, I would propose the following sequence of changes to submit.
- Minor updates when state is loaded by `read_tpx` to more informatively indicate it is preliminary, pending checkpoint loads.
- Remove or rework dependent code that requires late checkpoint loads. Potentially includes changes to what is included in checkpoint, such as parallelism runtime details.
- Move checkpoint load earlier and clearly establish initial state.
Other changes to be addressed in separate Redmine issues include modernizing and extracting the command-line arguments (a whole other can of worms) behind something like the MdpOptionsProvider interface, though, as above with the checkpoint data, these may require discussion of what is a simulation parameter versus an execution parameter.
(from redmine: issue id 2375, created on 2018-01-07 by eirrgang)
- Relations:
- relates #3040 (closed)
- blocks #2605 (closed)
- Changesets:
- Revision 88c7ed2d by Mark Abraham on 2019-05-22T18:54:32Z:
Introduce tri-state enums for restarts
Both the user choice for appending and the decision about how to
implement a restart are good to express as a three-way enumeration of
mutually-exclusive possibilities, rather than booleans.
Checkpoint restarts also need to consider whether KE quantities
need to be recomputed, which is now stored in t_ekinstate alongside
the data from which it was computed.
Together, these eliminate the ContinuationOptions struct.
Several booleans in implementation objects were renamed to be
consistent with the StartingBehavior enumeration values, so that the
code is easier to understand.
Moved the call to handleRestart out of updateFromCommandLine now that
it no longer needed to be there.
Used namespaces for handlerestart.cpp
Refs #2804, #2375
Change-Id: I1128b94e947c6ef355a1b137b8978faa227ab1a0
- Revision 916efac1 by Mark Abraham on 2019-05-22T20:44:14Z:
Rewrite starting behavior
Several necessary checks were deferred until the time the checkpoint
was read, which made this feature hard to implement and
understand. Having moved the code, if it will not be possible to
append, we can tell the user immediately.
Fixed a bug where mdrun -append would start from the .tpr
configuration when the checkpoint file was missing.
Opening the logfile in the non-appending cases is now
closely associated with the logic for how the restart
works.
Refs #2804, #2375
Change-Id: I83a846958619e72ddc9a5e9bae49a9b71221ad24
- Revision b7b078e2 by Mark Abraham on 2019-09-11T08:45:01Z:
Fix multi-sim restart handling in corner cases
If different simulations would have different starting behaviour,
e.g. some checkpoint files are found and some are not, then we should
not allow a restart, and do so with a useful error message.
Refs #2375
Change-Id: I8845784e8310ab6ca81db189e4a42754add03def