Fix gpu conv3d out-of-resources failure.
It seems the conv3d kernel is highly sensitive to internal variable
size, requiring 32-bit int variables to avoid running out of resources.
This fixes the cxx11_tensor_device_2 and cxx11_tensor_gpu_3 tests.
This is a partial reversion of !1192 (merged).