segmentation fault from ~LegacyMdrunOptions() - Redmine #3081
Only seems to occur in simulations launched through Python gmxapi. The problem was not obvious (only occurred in MPI-enabled GROMACS) before https://gerrit.gromacs.org/c/gromacs/+/12982
Initial investigation indicates done_commrec is either called multiple times or lacks sufficient checking for a nullptr. The lifetime, ownership, and copy safeness of LegacyMdrunOptions are not well specified, and it is not clear that ownership of a commrec instance is transferred to the LegacyMdrunOptions instance.
Two possible solutions that work in initial testing are to (a) add nullptr checks to done_commrec, or (b) to ~LegacyMdrunOptions.
Suggestions on how to manage commrec lifetime would be appreciated.
Otherwise, I will do a little more testing and submit a conservative change that adds checking in both places.
(from redmine: issue id 3081, created on 2019-09-09 by eirrgang, closed on 2019-10-16)
- Changesets:
- Revision a363c7ccec1e51e29afe749e40e58159b6b495f8 by Mark Abraham on 2019-09-10T10:21:46Z:
Use more RAII semantics with t_commrec
This permitted simplifying building the Mdrunner
Fixes #3081
Change-Id: I4ac5fb2017d960d3d0cfd0103e1d7232da9f1c3a
- Revision 51973164 by Mark Abraham on 2019-09-10T12:57:08Z:
Use more RAII semantics with t_commrec
This permitted simplifying building the Mdrunner
Fixes #3081
Change-Id: I4ac5fb2017d960d3d0cfd0103e1d7232da9f1c3a
- Revision 2ec3cd52 by Mark Abraham on 2019-09-27T15:27:45Z:
Improve MiMiC test implementation
These tests were not using callGrompp in the intended way, and
inadvertently relying on behavior that might change when follow-up
work on #3081 changes mdrun thread-MPI initialization behavior
Change-Id: Icdbb6049a7b609ddaabc8e073f87e8bbb00da4f2
- Revision e39949c7 by Mark Abraham on 2019-10-01T05:24:02Z:
Removed dependency on commrec of mdrun setup
Changes no functionality.
Setup is now parameterized directly on MPI_COMM_WORLD, which we will
want later for letting library-based callers pass in an
MPI_Communicator. This permits commrec to be initialized later, once
the threads have been spawned for the thread-MPI ranks.
The initialization of multi-simulations moves from LegacyMdrunOptions
to SimulationContext, which is more appropriate for
ensemble-parallelism established directly by the user.
Before the decision about the duty of a rank, there is no difference
between MASTER(cr) and SIMMASTER(cr), so several calls to macros
taking a t_commrec pointer are replaced by booleans. Introduced
findIsSimulationMasterRank to compute that value. This eliminates
early use of t_commrec that has necessitated other hacks and
workarounds.
Removed redundant check for replica exchange when the number of multi
simulations is less than two, because gmx_multisim_t constructor
already prohibits that.
Resolves several TODO items and improves modularity, too.
Refs #2587, #2605, #3081
Change-Id: I48bd3b713bc181b5c1e4cbcd648706a9f00eab96