You need to sign in or sign up before continuing.

Parrinello-Rahman checkpoint restart fails using modular simulator - Redmine #3377

The Parrinello-Rahman barostat using modular simulator does not allow restarts from checkpoint file.

Fatal error:
Cannot change a simulation algorithm during a checkpoint restart. Perhaps you
should make a new .tpr with grompp [...]

This error is thrown by the checkpoint loading routine. While the legacy implementation of the P-R barostat required the pressure at the previous step to be checkpointed, the modular implementation does not require this. load_checkpoint is, however, expecting this field to be present and throws an error.

While resolving this, another error was found. When initializing the modular simulator, the Parrinello-Rahman scaling might have happened one step too late under specific circumstances. Specifically, this could happen if the checkpoint was taken exactly on the scaling step, and later restarted from there. As checkpoint reading was not working, this is certain to never have happened, but needs to be fixed as well.

The steps to resolve this problem are as follows:

Move the decision on whether to use modular simulator before checkpoint reading. This change will be likely be useful very soon anyways, as checkpointing reading is planned to be modularized.
Fix the initialization of the scaling.
Fix the reading of checkpoints using Parrinello-Rahman.
Finally, this bug could only go unnoticed due to a lack of tests for the continuation using Parrinello-Rahman and md-vv (which was previously not implemented), so these need to be added.

(from redmine: issue id 3377, created on 2020-02-13 by ptmerz, closed on 2020-02-21)

Changesets:
- Revision 2078fd06 on 2020-02-13T02:45:35Z:

Expose vsite counting

This allows to check whether vsites are present before the
respective object is created.

Refs #3377 (prepares point 1)

Change-Id: I8273daf38d46e2f052573f48323b5b6137965e9f

Revision 42ba62e1 on 2020-02-14T02:08:06Z:

Move modular simulator decision before checkpoint loading

Currently, the decision on whether to use modular simulator is done
relatively late during the runner stage. This makes it impossible to
allow for different behavior at checkpoint loading time. The current
change therefore moves this decision before checkpoint loading time.
To achieve this, some adaptations were needed:

* Use gmx_mtop_interaction_count to determine whether virtual sites
  will be used before the respective object is created.
* The membrane embedding check via pointer is replaced by a boolean
  set earlier during the runner phase.
* The essential dynamics check was split to catch command line inputs
  during the runner phase, and mismatching checkpointing data during
  the simulator phase (mirroring legacy behavior in do_md()).
* Replace the ensemble restraint check by a low-level alternative
  for the early runner call (mimicking the distance restraint
  initialization), while keeping the current check for the
  simulator-level call. Note that as multi sims are disabled, this
  low-level test will effectively never fail, but the additional
  clarity is helpful in further development. The later test ensures
  that changes to the init_disres() don't make this check invalid -
  if they would ever get out of sync, the simulations would exit with
  a fatal error.

Refs #3377 (fixes point 1)

Change-Id: I635e033db51d6ecc8bf121c72730a121e04586dd

Revision 009ed957 on 2020-02-14T02:10:53Z:

Fix Parrinello-Rahman scaling on initial step (modular simulator)

If Parrinello-Rahman scaling was requested on the first step, it was
not properly initialized. The setup routine would have correctly
(although non-obviously so) calculated the scaling matrix, but have
requested the propagator to use the scaling one step too late.

For new simulations, this never happens (since scaling happens on the
second step, not the first). It could, however, lead to slight
errors if restarting from a checkpoint occured exactly on a scaling
step. As restarting from Parrinello-Rahman simulations using modular
simulator was broken anyway, we can be sure that this has never
happened in practice.

This change fixes the bug, adds explanations of what happens on the
initial step, and makes the function calls more explicit (at the cost
of a very small amount of code duplication).

Refs #3377 (fixes point 2)

Change-Id: Ic3ba7ba078260a9d039d506fc0a87353f80d23dd

Revision ca8f9e41 on 2020-02-14T02:14:41Z:

Fix reading of checkpoints with Parrinello-Rahman (modular simulator)

Using modular simulator, simulations using Parrinello-Rahman barostat
could not be read from checkpoint, throwing an error in the checkpoint
loading routine. While the legacy implementation of the P-R barostat
required the pressure at the previous step to be checkpointed, the
modular implementation does not require this. load_checkpoint is,
however, expecting this field to be present and throws an error.

This change fixes this by setting the globalState flags in dependence
of whether the modular simulator will be used, avoiding read_checkpoint
to expect this entry.

Note that tests ensuring this bug not to reappear are introduced in the
child change I3bcd0729.

Refs #3377 (fixes point 3)

Change-Id: If8afd294b8c79ceef66e71293d9d93cf2f7d0df8

Revision 3d9ea0b8 on 2020-02-14T04:26:06Z:

Expand test coverage of exact continuation tests

The exact continuation tests were not covering the new
Parrinello-Rahman functionality of modular simulator, nor the
berendsen-berendsen NPT case using md-vv. This change fixes this.

Fixes #3377 (fixes point 4, last task on the list)

Change-Id: I3bcd072969259383dd1812d425dd7b3baee5bd85