confusing error message when OMP_NUM_THREADS is used with GPUs - Redmine #2472
The r2018 code does not allow setting only the OpenMP thread count in a
GPU run (in tMPI builds), but as the OpenMP thread count handling was
changed and part of the reporting seems short-circuited (the env
var-related reporting from the omp_nthreads
module does not happen),
this leads to potentially confusing error messages that lack context.
$ OMP_NUM_THREADS=2 gmx mdrun -nsteps 0
[…]
GROMACS: gmx mdrun, version 2018
Executable: /opt/tcbsys/gromacs/2018/AVX2_256/bin/gmx
Data prefix: /opt/tcbsys/gromacs/2018/AVX2_256
Working dir: /home/pszilard/projects/gromacs/testing/water-048k
Command line:
gmx mdrun -nsteps 0
Back Off! I just backed up md.log to ./#md.log.95#
Reading file topol.tpr, VERSION 4.6-beta3-dev-20121222-492378e (single precision)
Note: file tpx version 82, software tpx version 112
The number of OpenMP threads was set by environment variable OMP_NUM_THREADS to 2
-------------------------------------------------------
Program: gmx mdrun, version 2018
Source file: src/gromacs/taskassignment/resourcedivision.cpp (line 224)
Fatal error:
When using GPUs, setting the number of OpenMP threads without specifying the
number of ranks can lead to conflicting demands. Please specify the number of
thread-MPI ranks as well (option -ntmpi).
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
In contrast, in r2016, besides there being no error, it is pretty clear that the environment variable’s value is used (that may have not been set by the user / at the time of mdrun invocation):
GROMACS: gmx mdrun, version 2016
Executable: /opt/tcbsys/gromacs/2016/AVX2_256/bin/gmx
Data prefix: /opt/tcbsys/gromacs/2016/AVX2_256
Working dir: /home/pszilard/projects/gromacs/testing/water-048k
Command line:
gmx mdrun -nsteps 0
Running on 1 node with total 4 cores, 8 logical cores, 2 compatible GPUs
Hardware detected:
CPU info:
Vendor: Intel
Brand: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
SIMD instructions most likely to fit this hardware: AVX2_256
SIMD instructions selected at GROMACS compile time: AVX2_256
Hardware topology: Basic
GPU info:
Number of GPUs detected: 2
#0: NVIDIA GeForce GTX 1080, compute ca: 6.1, ECC: no, stat: compatible
#1: NVIDIA GeForce GTX 960, compute ca: 5.2, ECC: no, stat: compatible
Reading file topol.tpr, VERSION 4.6-beta3-dev-20121222-492378e (single precision)
Note: file tpx version 82, software tpx version 110
Changing nstlist from 10 to 40, rlist from 1 to 1.101
The number of OpenMP threads was set by environment variable OMP_NUM_THREADS to 2
Overriding nsteps with value passed on the command line: 0 steps, 0 ps
Using 2 MPI threads
Using 2 OpenMP threads per tMPI thread
2 compatible GPUs are present, with IDs 0,1
2 GPUs auto-selected for this run.
Mapping of GPU IDs to the 2 PP ranks in this node: 0,1
(from redmine: issue id 2472, created on 2018-03-29 by pszilard, closed on 2018-06-14)
- Changesets:
- Revision f0c98f46 by Szilárd Páll on 2018-06-12T13:09:17Z:
Also issue OMP_NUM_THREADS reading note to the log
The note that was meant to inform users that OMP_NUM_THREADS was setting
the number of threads in their run (as this value can be inherited by
the env) has not been logged. It was also printed right after the tpx
reading statues making it hard to notice. Removed stderr output now
that this is no longer required.
This change makes the note easier to notice prepending a newline and
issues it to the log file too.
Refs #2472
Change-Id: I73fc9de5e9d747f9d7a094c6678ffc1547481b94