mdrun writes broken energy group values to .edr file - Redmine #1822
With 2 energy groups, `mdrun -nb cpu` and `mdrun -nb gpu` writes .edr
files such that `gmxcheck -e cpu-run -e2 gpu-run` gives
There are 39 terms to compare in the energy files
Coulomb (SR) step 0: -11637.2, step 0: -15106.6
Potential step 0: -6509.07, step 0: -9978.4
Total Energy step 0: -6501.01, step 0: -9970.33
Coul-SR:URE-URE step 0: 9361.08, step 0: -15106.6
LJ-SR:URE-URE step 0: 336.99, step 0: 3585.22
Coul-SR:URE-SOL step 0: -3929.27, step 0: 0
LJ-SR:URE-SOL step 0: -168.613, step 0: 0
Coul-SR:SOL-SOL step 0: -17069, step 0: 0
LJ-SR:SOL-SOL step 0: 3416.84, step 0: 0
Coulomb (SR) step 1: -11657.9, step 1: -15127.2
Even with a single energy group, I get
Coulomb (SR) step 0: -11637.2, step 0: -15106.6
Potential step 0: -6509.07, step 0: -9978.41
Total Energy step 0: -6501.01, step 0: -9970.34
Coul-SR:URE-URE step 0: 9361.08, step 0: -15106.6
LJ-SR:URE-URE step 0: 336.99, step 0: 3585.22
Coul-SR:URE-rest step 0: -3929.27, step 0: 0
LJ-SR:URE-rest step 0: -168.613, step 0: 0
Coul-SR:rest-rest step 0: -17069, step 0: 0
LJ-SR:rest-rest step 0: 3416.84, step 0: 0
Coulomb (SR) step 1: -11657.9, step 1: -15127.2
Tarball with repro materials and output attached.
The GPU .log file does say “NOTE: With GPUs, reporting energy group
contributions is not supported”. (In \#1293 it was suggested we move/add
such a comment near the end of the .log file. \#1727 also misunderstood
how to use the code)
Since energy groups are not supported on GPUs, we should not write an
.edr file with energy groups, so that users cannot erroneously use the
incorrect data they contain. If we’re unwilling to do that, then perhaps
energy-analysis tools should have a check for “all the fields zero
except the first”.
Frankly, there’s something to be said for only writing group-wise
contributions during a rerun. (Our code is likely not agile enough to be
able to call the energy-group kernels only on energy-output steps, so
even on the CPU the small overhead of energy groups is being paid every
MD ste) This would be slightly easier to do once we’ve removed the group
scheme.
*(from redmine: issue id 1822, created on 2015-09-15 by mark.j.abraham, closed on 2018-01-03)*
* Relations:
* relates #1293
* relates #1727
* Changesets:
* Revision 4a4dc78e0c059ed662ae29331fd4a6c2ad6278a2 by Erik Lindahl on 2018-01-03T12:26:05Z:
```
Don't allow multiple energy groups for GPU runs
Exit with a fatal error instead of only warning, since the
latter leads to writing data for energy groups that
is incorrect to the energy file.
Fixes #1822.
Change-Id: I34ccb10bba6d6e1350283e34ebc908c6f830baab
```
* Uploads:
* [energy-groups-issue.tgz](/uploads/b17c711f7aea052d3f4903d7436db80a/energy-groups-issue.tgz)
issue