Different warp size in AMD/ROCM
"Note that Nvidia and AMD devices have different warp sizes, so portable code should use the warpSize built-ins to query the warp size. Hipified code from the Cuda path requires careful review to ensure it doesn’t assume a waveSize of 32. “Wave-aware” code that assumes a waveSize of 32 will run on a wave-64 machine, but it will utilize only half of the machine resources." [1]
We need to check:
-
parallel reduction -
CSR light kernels
Edited by Tomas Oberhuber