EigenContractionKernel causes ptx error(cost too much shared memory)
Submitted by xiah
Assigned to Nobody
Link to original bugzilla bug (#1212)
Version: 3.3 (current stable)
Platform: GPU (CUDA)
Description
I changed the code as follows to reduce the cost shared memory.
//shared volatile Scalar lhs_shmem[72 * 64];
//shared volatile Scalar rhs_shmem[72 * 64];
shared volatile Scalar lhs_shmem[72 * 32];
shared volatile Scalar rhs_shmem[72 * 32];
And then I can compile it without errors. Is it possible to redesign the kernel to reduce the cost of shared memeory? usually nv's GPU has a most 48k shared memory.