re-design benchmarking functionality - Redmine #1781
mdrun counter reset does not interaction well with PME tuning. We should
probably delay counter reset until after tuning completes, and have a
better interface, e.g. mdrun -benchmarksteps 1000
and/or
mdrun -benchmarktime 0.05
See discussion beginning at comment 6.
Original issue follows:
The following PME LB error can be triggered by some runs:
NOTE: DLB will not turn on during the first phase of PME tuning
starting mdrun 'Water'
1000 steps, 2.0 ps.
step 80: timed with pme grid 200 100 100, coulomb cutoff 1.000: 1989.6 M-cycles
step 160: timed with pme grid 168 84 84, coulomb cutoff 1.187: 2006.8 M-cycles
step 240: timed with pme grid 144 72 72, coulomb cutoff 1.384: 2664.3 M-cycles
step 320: timed with pme grid 160 80 80, coulomb cutoff 1.246: 2205.9 M-cycles
step 400: timed with pme grid 168 84 84, coulomb cutoff 1.187: 2016.8 M-cycles
step 480: timed with pme grid 192 96 96, coulomb cutoff 1.038: 2097.2 M-cycles
optimal pme grid 200 100 100, coulomb cutoff 1.000
step 500: resetting all time and cycle counters
-------------------------------------------------------
Program gmx mdrun, VERSION 5.1-rc1
Source code file: /var/data0/sandbox/gromacs/bmdir/data/source/src/gromacs/ewald/pme-load-balancing.cpp, line: 927
Software inconsistency error:
pme_loadbal_do called at an interval != nstlist
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
-------------------------------------------------------
Program gmx mdrun, VERSION 5.1-rc1
Source code file: /var/data0/sandbox/gromacs/bmdir/data/source/src/gromacs/ewald/pme-load-balancing.cpp, line: 927
Software inconsistency error:
pme_loadbal_do called at an interval != nstlist
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
Detected by the NVIDIA Perflab team, they note that:
I’ve also noticed that unless this line pops up
NOTE: DLB can now turn on, when beneficial
Before
step 500: resetting all time and cycle counters
Then I get the pme_loadbal_do error.
Original log and standard output attached, input is 384k water box (but that is likely not relevant).
(from redmine: issue id 1781, created on 2015-07-15 by pszilard, closed on 2015-08-13)
- Relations:
- relates #1870 (closed)
- relates #2131 (closed)
- relates #2041 (closed)
- relates #1971 (closed)
- relates #1777 (closed)
- relates #2136 (closed)
- relates #2224 (closed)
- relates #2360 (closed)
- relates #2569 (closed)
- relates #2717 (closed)
- relates #2757 (closed)
- child #2495 (closed)
- Changesets:
- Revision 785aad1a by Mark Abraham on 2015-08-11T11:51:15Z:
Abort if PME tuning is active and counters reset
Triggering counter reset (in various ways) could happen at a
non-nstlist step, which provokes a software inconsistency error in
5.1. This is reveals that all recent releases have permitted reset
while tuning was active, which is useless and potentially wrong.
Introduced a getter pme_loadbal_is_active, so that the fatal error
can be issued when conditions for counter reset are satisfied
and PME load balancing is still active.
Noted a TODO to have the load-balancing module use its own getter in
future; such a refactoring is probably fine, but worth avoiding in a
bugfix branch. Noted a TODO to make a counter-reset module, consider
alternative solutions to #1781, and other clean-up. Documented some
stuff.
Fixes #1781
Change-Id: I912e3da837bd32280f295ad98cc6b8170f4d2d81
- Revision 7252de1a by Carsten Kutzner on 2016-11-16T16:59:06Z:
Increased the default reset step to 1500 in gmx tune_pme
Commit 785aad1a introduced a gmx_fatal() in mdrun for cases where
cycle counters are reset when PME tuning is still active. In
almost all cases, tuning takes longer than 100 steps (which was
the default at which gmx tune_pme would request mdrun to reset its
counters). This leads to gmx tune_pme reporting that all the
runs failed. Note that the small default of 100 steps was from times
where there was only DLB to account for, but not PME tuning.
With the increased default, this should happen only very rarely.
For future versions it would be nicer to implement a "-benchmarksteps"
command line parameter for mdrun which resets counters exactly after
PME tuning is finished and then performs a requested number of
benchmark MD steps. Refs #1781
Change-Id: Icbcce1ecc8a23d35302c04c9a6be13c06b1be8c8
- Revision f5cb6c13 by Mark Abraham on 2018-01-11T00:41:16Z:
Announce in user log files that features are deprecated.
These are merely informational notes, not warnings or errors.
Refs #1781, #1971, #2136
Change-Id: I96e19acb0e15d3f42b0929f555b451299a2882e4
- Revision 1aa4fa40 by Mark Abraham on 2018-08-22T13:59:10Z:
Fix handling of counter resets
There is no reason for or need to change max_hours, ir->nsteps or
step_rel when doing a counter reset.
This makes clear that the behaviour for the combination
mdrun -maxh t -resetstep n
matches the documentation of -maxh.
Updated the API for walltime_acccounting and its usage, because
elapsed time is an insufficiently clear context. Changed the names of
the start and stop functions so that no callers can silently rely on
semantics that have changed.
Avoided variables such as elapsed_time and max_hours, which were
insufficiently precisely worded.
Refs #1781
Change-Id: I16c96985f43a7b4ac75b94f378da3d05914d6986
- Revision cf2d8336 by Mark Abraham on 2018-10-13T19:53:48Z:
Deprecate various functionality in GROMACS 2019
Published a deprecation policy.
Updated the release notes to refer also to previously deprecated
features.
Announced intent to change some functionality:
* gmx mdrun -membed options (but not feature)
* gmx mdrun -rerun option (but not feature)
* integrator .mdp field will contain only integrators
* gmx do_dssp to be replaced by gmx dssp
* gmx trjconv and friends to be split and rewritten
List of newly deprecated functionality:
* conversion of aromatic rings to virtual sites
* gmx mdrun -table options (but not feature)
* gmx mdrun -gcom option and feature
* gmx mdrun -nsteps option and feature
* gmx mdrun -nsteps -resetstep -resethway moved to
a gmx benchmark tool
* gmx mdrun -confout removed
Also updated release notes for functionality removed in GROMACS 2019.
Refs #2495, #1781
Fixes #2569, #1925
Change-Id: I1d00859d0f15409a472984f5a65347a50c71ad17