CUDA CC 2.0 issue - Redmine #2273
Using either master HEAD, release-2016 HEAD, or tag v2016:
If I target compute and sm 20 (with
-DGMX_CUDA_TARGET_SM=20 -DGMX_CUDA_TARGET_COMPUTE=20
) then by default
we get a single CUDA compilation unit (since that’s the only thing that
can work). The regressiontests pass, but we we have an issue, e.g.
$ bin/mdrun-test --gtest_filter=\*Swap\*
...
Running on 1 node with total 4 cores, 8 logical cores, 2 compatible GPUs
Hardware detected:
CPU info:
Vendor: Intel
Brand: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
SIMD instructions most likely to fit this hardware: AVX_256
SIMD instructions selected at GROMACS compile time: AVX_256
Hardware topology: Full, with devices
GPU info:
Number of GPUs detected: 2
#0: NVIDIA GeForce GTX 960, compute ca: 5.2, ECC: no, stat: compatible
#1: NVIDIA GeForce GTX 660 Ti, compute ca: 3.0, ECC: no, stat: compatible
Reading file /home/marklocal/git/r2016/build-cmake-gcc-gpu-cc20-debug/src/programs/mdrun/tests/Testing/Temporary/CompelTest_SwapCanRun.tpr, VERSION 2016.5-dev-20170923-d36730ca3 (single precision)
Using 1 MPI thread
Using 1 OpenMP thread
1 GPU user-selected for this run.
Mapping of GPU ID to the 1 PP rank in this node: 0
NOTE: Thread affinity setting failed. This can cause performance degradation.
If you think your settings are correct, ask on the gmx-users list.
SWAP: Determining initial numbers of ions per compartment.
SWAP: Setting pointers for checkpoint writing
SWAP: Channel 0 flux history for ion type NA+ (charge 1): 0 molecules
SWAP: Channel 1 flux history for ion type NA+ (charge 1): 0 molecules
SWAP: Channel 0 flux history for ion type CL- (charge -1): 0 molecules
SWAP: Channel 1 flux history for ion type CL- (charge -1): 0 molecules
starting mdrun 'Channel_coco in octane membrane'
2 steps, 0.0 ps.
-------------------------------------------------------
Program: mdrun-test, version 2016.5-dev-20170923-d36730ca3
Source file: src/gromacs/mdlib/nbnxn_cuda/nbnxn_cuda.cu (line 633)
Fatal error:
cudaStreamSynchronize failed in cu_blockwait_nb: an illegal memory access was
encountered
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------
If I target compute and sm 30 then, by default, I get multiple CUDA
compilation units and there is no issue.
If I target compute and sm 30 and set
-DGMX_CUDA_NB_SINGLE_COMPILATION_UNIT=on
, then there is no issue.
So it looks to me like something in the CC 2.0 support is broken, or at least not properly used by the mdrun-test code. I’ll try to bisect a bit more and see what I learn.
The absence of a reported bug does suggest that there is not much use of release-2016 on CC 2.0, and we should consider removing support for CC 2.0 for GROMACS 2017. This would simplify our texture and CMake code, and remove the question of whether someone should try to cover this case in Jenkins. Clearly nobody has prioritized doing or automating testing on this old setu Note that NVIDIA has already deprecated those compilation targets in nvcc (and we suppress the warning). If we go this path, then I suggest we don’t bother trying to fix release-2016, and if someone later has an issue, suggest they use an even earlier version.
(from redmine: issue id 2273, created on 2017-10-13 by mark.j.abraham, closed on 2017-11-28)
- Changesets:
- Revision 29ba77b8 by Szilárd Páll on 2017-10-31T19:19:09Z:
Check CUDA available/compiled code compatibility
Added an early check to detect when the gmx binary does not embed code
compatible with the GPU device it tries to use nor does it have PTX that
could have been JIT-ed.
Additionally, if the user manually sets GMX_CUDA_TARGET_COMPUTE=20 and
no later SM or COMPUTE but runs on >2.0 hardware, we'd be executing
JIT-ed Fermi kernels with incorrect host-side code assumptions
(e.g amount of shared memory allocated or texture type).
This change also prevents such cases.
Fixes #2273
Change-Id: I5472b1a33e584a75f451e21e9fd25992633fbea9