The Cmake code in cmake/gmxManageGPU.cmake incorrectly ends up in the
message(STATUS “Enabling single compilation unit for the CUDA non-bonded
module….”
part of the if-stmt when building with CUDA 9.9
It does this despite correctly identifying CUDA 9.0 and using only >=
3.0 cuda targets.
Since the code in cmake/gmxManageNvccConfig.cmake correctly sets
GMX_CUDA_NVCC_FLAGS regardless of doing autodetection or using user
specified SM/COMPUTE targets, the code in cmake/gmxManageGPU.cmake
macro(gmx_gpu_setup) can be simplified to:
if (GMX_GPU AND NOT GMX_CLANG_CUDA) gmx_check_if_changed(_gmx_cuda_target_changed GMX_CUDA_NVCC_FLAGS) if(_gmx_cuda_target_changed OR NOT GMX_GPU_DETECTION_DONE) if(GMX_CUDA_NVCC_FLAGS MATCHES "_2[01]") message(STATUS "Enabling single compilation unit... else() message(STATUS "Enabling multiple compilation...
(This is the problem I incorrectly targeted in issue 2560)
(from redmine: issue id 2561, created on 2018-06-26 by akesandgren, closed on 2018-08-21)
Revision 1982bad5 by Szilárd Páll on 2018-08-20T23:18:42Z:
Disable single compilation unit with CUDA 9.0CUDA 9.0 does not support CC 2.x, hence even with the default list oftargeted arch, multiple compilation units an be enabled.Fixes #2561Change-Id: I741081da39539f211fad32bda5c1f1dccc4378f9
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
grep ‘Enabling single compilation’ cmake.out
— Enabling single compilation unit for the CUDA non-bonded module.
Multiple compilation units are not compatible with CC 2.x devices, to
enable the feature specify only CC >=3.0 target architectures in
GMX_CUDA_TARGET_SM/GMX_CUDA_TARGET_COMPUTE.
(from redmine: written on 2018-07-17 by akesandgren)
Exactly, it doesn’t contain them, thus the code should take the default
of using what the CUDA supports.
And there is code in there to figure this out based on CUDA version, but
it isn’t used. The result is just ignored.
With the code shown in the original post, everything works in both
cases, with and without GMX_CUDA_TARGET_SM or
GMX_CUDA_TARGET_COMPUTE
(from redmine: written on 2018-07-17 by akesandgren)
...-- Enabling single compilation unit for the CUDA non-bonded module. Multiple compilation units are not compatible with CC 2.x devices, to enable the feature specify only CC >=3.0 target architectures in GMX_CUDA_TARGET_SM/GMX_CUDA_TARGET_COMPUTE....
which is the expected result (supporting 2.x devices with CUDA < 9
requires a single compilation unit in 2018; CC 2.x is now not supported
at all in master) if perhaps a little tricky for users to understand
whether they should act. (If they know about compute capabilities, it’s
not clear that by default GROMACS builds for all of them, and if they
don’t know about compute capabilities then nothing makes sense.)
enables multiple compilation units, as expected since that is now
possible.
With the reform proposed in #2390, then we can avoid some aspects of
such problems in future, but for now Ake’s proposal looks good to me.
Szilard can you prepare a patch, please?
(from redmine: written on 2018-07-18 by mark.j.abraham)
Exactly, it doesn’t contain them, thus the code should take the
default of using what the CUDA supports.
Sorry, forgot that CUDA 9.0 skips generating CC 2.0 code.
IIRC the reason why the current code does not just test
CUDA_NVCC_FLAGS is that the compiler can generate code for CC 2.0
despite a command line that does not contain "_2[01]". For the same
reason, the proposed change could also lead to not triggering single
compilation unit for some CUDA version <9.0 for a CUDA_NVCC_FLAGS
string that implicitly triggers CC 2.0 code generation. I’d rather make
the code explicit and add a CUDA version check to the conditional than
take the risk (or trawl through the nvcc docs).
If we test for the features supported by nvcc then we will have a good
way to guess whether it is at all capable of e.g. generating code for CC
2.0 (which Szilard identifies as a risk). If the user chooses particular
CC, then we check in cmake that this will work. If the user chooses
multiple compilation units but CC 2.0 is supported, then cmake should
fail (inconsistent input). If the user chooses nothing and CC 2.0 is
supported, then we default to single compilation unit. If the user
chooses nothing and CC 2.0 is not supported, then we default to multiple
compilation units.
I don’t see any extra value delivered by any CUDA version checks.
Different OS first support newer CC in different CUDA versions, so such
checks are brittle. They can’t help the build work with unknown future
CUDA versions. This is the same thinking behind taking advantage of
compiler and glibc feature-reporting gear, and testing functionality at
configure time, rather than hard coding specific version checks.
(from redmine: written on 2018-07-18 by mark.j.abraham)
Gerrit received a related patchset ‘1’ for Issue #2561.
Uploader: Szilárd Páll (pall.szilard@gmail.com)
Change-Id:
gromacsrelease-2018I741081da39539f211fad32bda5c1f1dccc4378f9
Gerrit URL: https://gerrit.gromacs.org/8106
(from redmine: written on 2018-07-19 by gmxdefault)
If we test for the features supported by nvcc then we will have a
good way to guess whether it is at all capable of e.g. generating code
for CC 2.0 (which Szilard identifies as a risk).
How do you suggest testing individual features of nvcc? (Writing a bunch
of execute_process calls with manual regex matching is fragile and
tedious IMO).
If the user chooses particular CC, then we check in cmake that this
will work. If the user chooses multiple compilation units but CC 2.0 is
supported, then cmake should fail (inconsistent input). If the user
chooses nothing and CC 2.0 is supported, then we default to single
compilation unit. If the user chooses nothing and CC 2.0 is not
supported, then we default to multiple compilation units.
What we don’t check for is default behavior of nvcc, e.g. when
-gencode flags are cleared. I’ve just checked and we don’t accept
empty GMX_CUDA_TARGET_* to clear the list of targeted arch, so only
the fully manual override would be able to trigger that case (and that
can be triggered regardless of how this fix is implemented). Admittedly,
this case would however be caught by a compile-time error, so there’s no
risk of incorrect results.
I don’t see any extra value delivered by any CUDA version checks.
Check. a Single one would be needed for this specific fix:
if((CUDA_VERSION VERSION_LESS "9.0") AND ((NOT GMX_CUDA_TARGET_SM AND NOT GMX_CUDA_TARGET_COMPUTE) OR (GMX_CUDA_TARGET_SM MATCHES "2[01]" OR GMX_CUDA_TARGET_COMPUTE MATCHES "2[01]")))
and this doesn’t need to change later as CC 2.0 support won’t be
re-introduced later, I think.
Different OS first support newer CC in different CUDA versions, so
such checks are brittle. They can’t help the build work with unknown
future CUDA versions. This is the same thinking behind taking advantage
of compiler and glibc feature-reporting gear, and testing functionality
at configure time, rather than hard coding specific version checks.
Not sure what you mean — hardware support is purely CUDA
version-dependent, operating systems / distros have no say in it. I
don’t think this is similar at all to glibc-related checks, CUDA has a
single implementation with clear deprecation version cutoffs.
Back to the main point: I think for this specific fix the change
proposed by Åke has the advatage of slight simplification, but the
slight drawback that it does
gmx_check_if_changed(_gmx_cuda_target_changed GMX_CUDA_NVCC_FLAGS)
which is a slight overkill in that any change in any of the C or nvcc flags will trigger a status message.
If we test for the features supported by nvcc then we will
have a good way to guess whether it is at all capable of e.g. generating
code for CC 2.0 (which Szilard identifies as a risk).
How do you suggest testing individual features of nvcc? (Writing a
bunch of execute_process calls with manual regex matching is fragile
and tedious IMO).
We wrap the execute_process() call of nvcc in a function that takes the
name of a cache variable to set with the result of the call to nvcc,
passing an optional extra flag. The current check for basic
functionality of nvcc is just the special case of no extra flag. See
e.g.
https://redmine.gromacs.org/projects/gromacs/repository/revisions/master/entry/cmake/CheckCXXCompilerFlag.cmake
and the way that cmake’s Modules/CheckCXXSourceCompiles.cmake wraps a
try_compile(). So we end up with a collection of cache variables named
like NVCC_SUPPORTS_CC_2_0 etc.
If the user chooses particular CC, then we check in cmake that
this will work. If the user chooses multiple compilation units but CC
2.0 is supported, then cmake should fail (inconsistent input). If the
user chooses nothing and CC 2.0 is supported, then we default to single
compilation unit. If the user chooses nothing and CC 2.0 is not
supported, then we default to multiple compilation units.
What we don’t check for is default behavior of nvcc, e.g. when
-gencode flags are cleared. I’ve just checked and we don’t accept
empty GMX_CUDA_TARGET_* to clear the list of targeted arch, so only
the fully manual override would be able to trigger that case (and that
can be triggered regardless of how this fix is implemented). Admittedly,
this case would however be caught by a compile-time error, so there’s no
risk of incorrect results.
I don’t see any extra value delivered by any CUDA version
checks.
Check. a Single one would be needed for this specific fix:
[…]
and this doesn’t need to change later as CC 2.0 support won’t be
re-introduced later, I think.
Yeah something like that works, but it does not address the same problem
that we will have in the future with CC 3.0 being deprecated and
removed.
Different OS first support newer CC in different CUDA
versions, so such checks are brittle. They can’t help the build work
with unknown future CUDA versions. This is the same thinking behind
taking advantage of compiler and glibc feature-reporting gear, and
testing functionality at configure time, rather than hard coding
specific version checks.
Not sure what you mean — hardware support is purely CUDA
version-dependent, operating systems / distros have no say in it.
Different CUDA versions support different compute capabilities on
different OS. See #2390. We should test the compiler for feature
capabilities.
I don’t think this is similar at all to glibc-related checks, CUDA
has a single implementation with clear deprecation version cutoffs.
They’re not clear in advance, so we are currently shipping code that
will not compile on some future version of CUDA. AFAIK there is no
published deprecation/removal plan for CC 3.0, for example. Using
feature checks on the compiler is more future proof
——
Back to the main point: I think for this specific fix the change
proposed by Åke has the advatage of slight simplification, but the
slight drawback that it does
gmx_check_if_changed(_gmx_cuda_target_changed GMX_CUDA_NVCC_FLAGS)
which is a slight overkill in that any change in any of the C or nvcc flags will trigger a status message.
(from redmine: written on 2018-07-30 by mark.j.abraham)
Szilard’s patch resolves the issue in the short term, but IMO there is
still a long term problem that has not been resolved. Let’s close this,
and I will note in #2390 that there is more consideration required,
either for 2018.x or 2019.
(from redmine: written on 2018-08-20 by mark.j.abraham)