Cuda: Calling some Eigen methods failing with Address 0x00000000 is out of bounds
Submitted by Ali Nakipoglu
Assigned to Nobody
Link to original bugzilla bug (#1415)
Version: 3.3 (current stable)
Operating system: Linux
Description
Created attachment 784
Test Project Directory
Hi,
For decomposing/composing transform matrices in Cuda kernels, I am using Transform primitive and calling Transform<>::fromPositionOrientationScale or Transform<>::computeRotationScaling causing my application to freeze. Running Cuda-Memcheck report prints:
========= Invalid global read of size 4
========= at 0x00000050 in kernel(unsigned long, Eigen::Matrix<float, int=4, int=4, int=2, int=4, int=4>*)
========= by thread (0,0,0) in block (0,0,0)
========= Address 0x00000000 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib64/libcuda.so.1 (cuLaunchKernel + 0x2c5) [0x204235]
========= Host Frame:libcudart.so.8.0 [0xd23d]
========= Host Frame:libcudart.so.8.0 (cudaLaunch + 0x143) [0x33783]
========= Host Frame:EigenTest [0xf4d]
========= Host Frame:EigenTest (_Z63__device_stub__Z6kernelmPN5Eigen6MatrixIfLi4ELi4ELi2ELi4ELi4EEEmPN5Eigen6MatrixIfLi4ELi4ELi2ELi4ELi4EEE + 0x67) [0xe53]
========= Host Frame:EigenTest (_Z6kernelmPN5Eigen6MatrixIfLi4ELi4ELi2ELi4ELi4EEE + 0x23) [0xe7e]
========= Host Frame:EigenTest (_Z4testv + 0x91) [0xd86]
========= Host Frame:EigenTest (main + 0x9) [0xcd9]
========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xfd) [0x1ed1d]
========= Host Frame:EigenTest [0xc09]
========= CUDA-MEMCHECK
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib64/libcuda.so.1 [0x2ef503]
========= Host Frame:libcudart.so.8.0 (cudaDeviceSynchronize + 0x166) [0x334a6]
========= Host Frame:EigenTest (_Z4testv + 0x96) [0xd8b]
========= Host Frame:EigenTest (main + 0x9) [0xcd9]
========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xfd) [0x1ed1d]
========= Host Frame:EigenTest [0xc09]
========= Program hit cudaErrorLaunchFailure (error 4) due to "unspecified launch failure" on CUDA API call to cudaDeviceSynchronize.On my system I have Cuda 8.0.44 and Quadro M5000. My guess is that somewhere in the Eigen it's trying to call 'host' function. But I wasn't able to find it. Hopefully it should be easy to reproduce it, I was able to get my application freezing every single time. Also checked the available system/device memory prior to test. I had free 62GB system/8GB device memory.
Attached project for your reference.
Many Thanks
Attachment 784, "Test Project Directory":
EigenTest.tar.gz